Automated alternatives

Best SARLO-80: Worldwide Slant SAR Language Optic Dataset 80cm alternatives.

Live source-backed alternatives to SARLO-80: Worldwide Slant SAR Language Optic Dataset 80cm for Vision-language. Alternatives are selected from the same task category and update whenever the best-of index rebuilds.

Alternatives

same task category

Sources

distinct URLs

Modules

indexable

Updated

Jun 26, 2026

from radar data

Reference option

SARLO-80: Worldwide Slant SAR Language Optic Dataset 80cm

Multimodal foundation models have advanced rapidly thanks to large optical benchmarks, but comparable resources for synthetic aperture radar (SAR) remain limited. Existing SAR--optical datasets largely rely on low-resolution, intensity-only Ground Range Detected~(GRD) products and do not preserve complex-valued SAR measurements or native acquisition geometry, which restricts physically grounded multimodal learning. In particular, large-scale public datasets combining very-high-resolution (VHR) SAR SLC, aligned optical imagery, and natural-language descriptions are still lacking. We present a VHR SAR--optical--text dataset built from open-access Umbra spotlight acquisitions distributed as Sensor Independent Complex Data (SICD). From around 2,500 worldwide scenes (VV/HH, 20cm--2m native resolution), we standardize all SAR data to an 80cm slant-range grid via band-limited FFT resampling and tile the imagery into 1024 by 1024 patches. For each SAR patch, we retrieve a high-resolution optical tile and warp it into the SAR grid using local coordinate correspondences for local pixel-level alignment. We further generate three caption variants (SHORT/MID/LONG) per sample to support vision--language training and evaluation. Our dataset contains 119,566 triplets (complex and amplitude slant-range SAR patch, aligned optical patch, natural-language description) covering 257 locations across 72 countries and a broad range of land types and infrastructures. We release fixed train/validation/test splits and the full preprocessing and baseline code to enable reproducible benchmarks for multimodal alignment on cross-modal retrieval and conditional generation in native SAR geometry. The dataset is publicly available on the Hugging Face Hub at https://huggingface.co/datasets/ONERA/SARLO-80. cs.CV Multimodal foundation models have advanced rapidly thanks to large optical benchmarks, but comparable resources for synthetic aperture radar (SAR) remain limited. Existing SAR--optical datasets largely rely on low-resolution, intensity-only Ground Range Detected~(GRD) products and do not preserve complex-valued SAR measurements or native acquisition geometry, which restricts physically grounded multimodal learning. In particular, large-scale public datasets combining very-high-resolution (VHR) SAR SLC, aligned optical imagery, and natural-language descriptions are still lacking. We present a VHR SAR--optical--text dataset built from open-access Umbra spotlight acquisitions distributed as Sensor Independent Complex Data (SICD). From around 2,500 worldwide scenes (VV/HH, 20cm--2m native resolution), we standardize all SAR data to an 80cm slant-range grid via band-limited FFT resampling and tile the imagery into 1024 by 1024 patches. For each SAR patch, we retrieve a high-resolution optical tile and warp it into the SAR grid using local coordinate correspondences for local pixel-level alignment. We further generate three caption variants (SHORT/MID/LONG) per sample to support vision--language training and evaluation. Our dataset contains 119,566 triplets (complex and amplitude slant-range SAR patch, aligned optical patch, natural-language description) covering 257 locations across 72 countries and a broad range of land types and infrastructures. We release fixed train/validation/test splits and the full preprocessing and baseline code to enable reproducible benchmarks for multimodal alignment on cross-modal retrieval and conditional generation in native SAR geometry. The dataset is publicly available on the Hugging Face Hub at https://huggingface.co/datasets/ONERA/SARLO-80. Research signal collected from arXiv metadata; Gemini enrichment can add a clearer summary. cs.CV cs.AI cs.DB benchmark eval evaluation

RDR75Research-onlyarxiv-ai

Alternative

NVIDIA NIM Model Catalog

Matched vision-language, vision language, multimodal; 3 source links; official inference catalog signal; access model: Free endpoint

RDR84Free endpoint

Alternative

Hugging Face Inference Providers

Matched vision-language, vision language, multimodal; 2 source links; official inference catalog signal; access model: Paid API

RDR81Paid API

#	Alternative	Kind	Access	Fit	Why it appears	Source
01	NVIDIA NIM Model Catalog	service	Free endpoint	RDR84	Matched vision-language, vision language, multimodal; 3 source links; official inference catalog signal; access model: Free endpoint	build.nvidia.com
02	Hugging Face Inference Providers	service	Paid API	RDR81	Matched vision-language, vision language, multimodal; 2 source links; official inference catalog signal; access model: Paid API	huggingface.co
03	A Unified Framework for Efficient Remote Sensing Visual Question Answering: Adapting Dual, Hybrid, and Encoder-Decoder Architectures	paper	Research-only	RDR80	Matched vision-language, vision language, vlm; 1 source link; access model: Research-only	arxiv.org
04	Fireworks AI Serverless Models	service	Paid API	RDR80	Matched vision-language, vision language, multimodal; 2 source links; official inference catalog signal; access model: Paid API	docs.fireworks.ai
05	Together AI Serverless Models	service	Paid API	RDR80	Matched vision-language, vision language, multimodal; 2 source links; official inference catalog signal; access model: Paid API	docs.together.ai
06	Paying More Attention to Visual Tokens in Self-Evolving Large Multimodal Models	paper	Research-only	RDR75	Matched vision-language, vision language, multimodal; 1 source link; access model: Research-only; freshly updated	arxiv.org
07	RSICCLLM: A Multimodal Large Language Model for Remote Sensing Image Change Captioning	paper	Research-only	RDR75	Matched vision-language, vision language, multimodal; 2 source links; access model: Research-only; freshly updated	arxiv.org