REMSA: An LLM Agent for Foundation Model Selection in Remote Sensing

Generative AI & LLMs
Published: arXiv: 2511.17442v1
Authors

Binger Chen Tacettin Emre Bök Behnood Rasti Volker Markl Begüm Demir

Abstract

Foundation Models (FMs) are increasingly used in remote sensing (RS) for tasks such as environmental monitoring, disaster assessment, and land-use mapping. These models include unimodal vision encoders trained on a single data modality and multimodal architectures trained on combinations of SAR, multispectral, hyperspectral, and image-text data. They support diverse RS tasks including semantic segmentation, image classification, change detection, and visual question answering. However, selecting an appropriate remote sensing foundation model (RSFM) remains difficult due to scattered documentation, heterogeneous formats, and varied deployment constraints. We introduce the RSFM Database (RS-FMD), a structured resource covering over 150 RSFMs spanning multiple data modalities, resolutions, and learning paradigms. Built on RS-FMD, we present REMSA, the first LLM-based agent for automated RSFM selection from natural language queries. REMSA interprets user requirements, resolves missing constraints, ranks candidate models using in-context learning, and provides transparent justifications. We also propose a benchmark of 75 expert-verified RS query scenarios, producing 900 configurations under an expert-centered evaluation protocol. REMSA outperforms several baselines, including naive agents, dense retrieval, and unstructured RAG-based LLMs. It operates entirely on publicly available metadata and does not access private or sensitive data.

Paper Summary

Problem
The main problem this paper addresses is the challenge of selecting the most suitable foundation model (FM) for a specific remote sensing (RS) task. With the growing availability of RS data and applications, there is a need for models that can generalize across various RS data modalities with different spatial, spectral, and temporal resolutions. However, selecting the right FM for a task is challenging due to scattered documentation, heterogeneous formats, and complex deployment constraints.
Key Innovation
The key innovation of this paper is the development of REMSA (Remote Sensing Model Selection Agent), a large language model (LLM) agent that combines structured metadata grounding, dense retrieval, in-context ranking, clarification, explanation, memory augmentation, and a task-aware orchestration mechanism to support complex FM selection in real RS settings. REMSA is the first LLM agent designed for FM selection in RS, and it operates entirely on publicly available metadata of open-source RSFMs without accessing private or sensitive data.
Practical Impact
The practical impact of this research is significant, as it enables personalized, reproducible, and efficient FM selection for RS applications. REMSA can be applied in various RS tasks and data modalities, making it a valuable tool for researchers and practitioners in the field. The paper also introduces the RS-FMD, the first structured and schema-guided database of over 150 RSFMs, which will be released as a community resource with continuous maintenance and updates.
Analogy / Intuitive Explanation
Imagine you're a doctor trying to diagnose a patient with a rare disease. You have access to various medical tests and treatments, but you need to select the best one for the patient's specific condition. REMSA is like a sophisticated medical assistant that helps you make that decision by analyzing the patient's symptoms, medical history, and test results. It provides you with a list of potential treatments, their strengths and weaknesses, and even explains why it recommends each one. REMSA does the same thing for RS tasks, helping users select the best FM for their specific task by analyzing the task requirements, data modalities, and FM characteristics.
Paper Information
Categories:
cs.CV cs.AI
Published Date:

arXiv ID:

2511.17442v1

Quick Actions