Text Embedded Swin-UMamba for DeepLesion Segmentation

AI in healthcare
Published: arXiv: 2508.06453v1
Authors

Ruida Cheng Tejas Sudharshan Mathai Pritam Mukherjee Benjamin Hou Qingqing Zhu Zhiyong Lu Matthew McAuliffe Ronald M. Summers

Abstract

Segmentation of lesions on CT enables automatic measurement for clinical assessment of chronic diseases (e.g., lymphoma). Integrating large language models (LLMs) into the lesion segmentation workflow offers the potential to combine imaging features with descriptions of lesion characteristics from the radiology reports. In this study, we investigate the feasibility of integrating text into the Swin-UMamba architecture for the task of lesion segmentation. The publicly available ULS23 DeepLesion dataset was used along with short-form descriptions of the findings from the reports. On the test dataset, a high Dice Score of 82% and low Hausdorff distance of 6.58 (pixels) was obtained for lesion segmentation. The proposed Text-Swin-UMamba model outperformed prior approaches: 37% improvement over the LLM-driven LanGuideMedSeg model (p < 0.001),and surpassed the purely image-based xLSTM-UNet and nnUNet models by 1.74% and 0.22%, respectively. The dataset and code can be accessed at https://github.com/ruida/LLM-Swin-UMamba

Paper Summary

Key Innovation
This research integrates large language models (LLMs) into the lesion segmentation workflow by embedding text descriptions of lesion characteristics from radiology reports into a neural network architecture called Swin-UMamba. This integration enables the model to combine imaging features with descriptive information about lesions, leading to improved segmentation results.
Practical Impact
The proposed Text-Swin-UMamba model outperforms previous approaches in lesion segmentation, achieving a high Dice Score of 82% and low Hausdorff distance on the test dataset. The practical impact is that this research can lead to more accurate diagnoses and treatment planning for patients with chronic diseases.
Analogy / Intuitive Explanation
Imagine trying to describe a complex medical image without any context or information about what you're looking at. It's like trying to paint a picture without knowing what colors, shapes, or textures are involved. By incorporating text descriptions of lesion characteristics into the segmentation process, this research provides a vital "context" that helps the model better understand the imaging features and produce more accurate results. In summary, this innovative approach combines the power of language models with computer vision to improve the accuracy of lesion segmentation on CT scans, leading to better patient outcomes.
Paper Information
Categories:
cs.CV cs.AI
Published Date:

arXiv ID:

2508.06453v1

Quick Actions