Deep learning techniques have shown considerable promise in improving diagnostic accuracy and clinical decision-making within the realm of medical image segmentation [1]-[5]. Despite strong performance on standardized mono-scanner datasets, segmentation models often experience significant performance degradation when applied to images obtained from different MRI scanners or varying sequence parameters, a challenge commonly referred to as manufacturer shift [6]. This study examines the influence of scanner variability on the robustness of deep learning models for segmenting rectal cancer and mesorectum in T2-weighted MRI, utilizing a retrospective dataset of 107 subjects collected from 16 different MRI scanners within a single institution. To address this, we explore the effectiveness of data augmentation, domain adaptation, and transfer learning techniques in enhancing model generalization across diverse scanner configurations. We evaluated multiple segmentation models and employed an ensembling strategy, demonstrating the ability of our method to generalize on unseen, heterogeneous data. For rectal cancer segmentation, the ensemble approach achieved a mean Dice similarity coefficient (DSC) of 0.737 ± 0.108, similar to the best single model UMambaBot 3D (DSC = 0.735 ± 0.094). In mesorectum segmentation, the ensemble achieved a mean DSC of 0.770 ± 0.133, outperforming the UMambaBot 3D model with non-linear data augmentation (DSC = 0.759 ± 0.127). As shown in Figure 1, a comparison of segmentation results reveals the performance of the ensemble approach compared to individual models. Our findings emphasize the importance of utilizing a dataset that mirrors the real-world variability inherent in clinical imaging, which is crucial for improving the reliability of deep learning based segmentation tools in clinical practice.

Figure 1: Comparison of segmentation results for representative models (one for each architecture family). Each row represents a different case. Rectal cancer is shown in gray and the mesorectum in white. The final column shows the ensemble result: blue indicates the rectal cancer, green represents the mesorectum.
Acknowledgments – This work was supported by the research grant SPOL_PRIN2022DM104.23_01 – Bando 2022 Prot. 20225WC9YA, titled Artificial intelligence model predicting pathological response in patients with locally advanced rectal cancer after neoadjuvant treatment, coordinated by Prof. Gaya Spolverato (CUP C53D23006290006). No potential conflict of interest was reported by the authors.
References
[1] Ronneberger et al. arXiv preprint arXiv:1505.04597.
[2] Isensee et al., 2021, Nat. Methods
[3] Ma et al. arXiv preprint arXiv:2401.04722.
[4] Liu et al., 2024, MICCAI
[5] Hatamizadeh et al., 2021, MICCAI BrainLesion Workshop
[6] Yan et al., 2020, Radiology: Artificial Intelligence