LIVE-Last scan updating-53 sources active-129 signals today-RESEARCH PGaussDet: Open-Vocabulary and Referring Segmentation for 3D Gaussians Using 2D Detectors
Automated alternatives

Best SARLO-80: Worldwide Slant SAR Language Optic Dataset 80cm alternatives.

Live source-backed alternatives to SARLO-80: Worldwide Slant SAR Language Optic Dataset 80cm for Vision-language. Alternatives are selected from the same task category and update whenever the best-of index rebuilds.

Alternatives
7
same task category
Sources
15
distinct URLs
Modules
6
indexable
Updated
Jun 26, 2026
from radar data
Reference option

SARLO-80: Worldwide Slant SAR Language Optic Dataset 80cm

Multimodal foundation models have advanced rapidly thanks to large optical benchmarks, but comparable resources for synthetic aperture radar (SAR) remain limited. Existing SAR--optical datasets largely rely on low-resolution, intensity-only Ground Range Detected~(GRD) products and do not preserve complex-valued SAR measurements or native acquisition geometry, which restricts physically grounded multimodal learning. In particular, large-scale public datasets combining very-high-resolution (VHR) SAR SLC, aligned optical imagery, and natural-language descriptions are still lacking. We present a VHR SAR--optical--text dataset built from open-access Umbra spotlight acquisitions distributed as Sensor Independent Complex Data (SICD). From around 2,500 worldwide scenes (VV/HH, 20cm--2m native resolution), we standardize all SAR data to an 80cm slant-range grid via band-limited FFT resampling and tile the imagery into 1024 by 1024 patches. For each SAR patch, we retrieve a high-resolution optical tile and warp it into the SAR grid using local coordinate correspondences for local pixel-level alignment. We further generate three caption variants (SHORT/MID/LONG) per sample to support vision--language training and evaluation. Our dataset contains 119,566 triplets (complex and amplitude slant-range SAR patch, aligned optical patch, natural-language description) covering 257 locations across 72 countries and a broad range of land types and infrastructures. We release fixed train/validation/test splits and the full preprocessing and baseline code to enable reproducible benchmarks for multimodal alignment on cross-modal retrieval and conditional generation in native SAR geometry. The dataset is publicly available on the Hugging Face Hub at https://huggingface.co/datasets/ONERA/SARLO-80. cs.CV Multimodal foundation models have advanced rapidly thanks to large optical benchmarks, but comparable resources for synthetic aperture radar (SAR) remain limited. Existing SAR--optical datasets largely rely on low-resolution, intensity-only Ground Range Detected~(GRD) products and do not preserve complex-valued SAR measurements or native acquisition geometry, which restricts physically grounded multimodal learning. In particular, large-scale public datasets combining very-high-resolution (VHR) SAR SLC, aligned optical imagery, and natural-language descriptions are still lacking. We present a VHR SAR--optical--text dataset built from open-access Umbra spotlight acquisitions distributed as Sensor Independent Complex Data (SICD). From around 2,500 worldwide scenes (VV/HH, 20cm--2m native resolution), we standardize all SAR data to an 80cm slant-range grid via band-limited FFT resampling and tile the imagery into 1024 by 1024 patches. For each SAR patch, we retrieve a high-resolution optical tile and warp it into the SAR grid using local coordinate correspondences for local pixel-level alignment. We further generate three caption variants (SHORT/MID/LONG) per sample to support vision--language training and evaluation. Our dataset contains 119,566 triplets (complex and amplitude slant-range SAR patch, aligned optical patch, natural-language description) covering 257 locations across 72 countries and a broad range of land types and infrastructures. We release fixed train/validation/test splits and the full preprocessing and baseline code to enable reproducible benchmarks for multimodal alignment on cross-modal retrieval and conditional generation in native SAR geometry. The dataset is publicly available on the Hugging Face Hub at https://huggingface.co/datasets/ONERA/SARLO-80. Research signal collected from arXiv metadata; Gemini enrichment can add a clearer summary. cs.CV cs.AI cs.DB benchmark eval evaluation

RDR75Research-onlyarxiv-ai
Alternative

NVIDIA NIM Model Catalog

Matched vision-language, vision language, multimodal; 3 source links; official inference catalog signal; access model: Free endpoint

RDR84Free endpoint
Alternative

Hugging Face Inference Providers

Matched vision-language, vision language, multimodal; 2 source links; official inference catalog signal; access model: Paid API

RDR81Paid API
#AlternativeKindAccessFitWhy it appearsSource
01NVIDIA NIM Model Catalog serviceFree endpointRDR84Matched vision-language, vision language, multimodal; 3 source links; official inference catalog signal; access model: Free endpointbuild.nvidia.com
02Hugging Face Inference Providers servicePaid APIRDR81Matched vision-language, vision language, multimodal; 2 source links; official inference catalog signal; access model: Paid APIhuggingface.co
03A Unified Framework for Efficient Remote Sensing Visual Question Answering: Adapting Dual, Hybrid, and Encoder-Decoder ArchitecturespaperResearch-onlyRDR80Matched vision-language, vision language, vlm; 1 source link; access model: Research-onlyarxiv.org
04Fireworks AI Serverless Models servicePaid APIRDR80Matched vision-language, vision language, multimodal; 2 source links; official inference catalog signal; access model: Paid APIdocs.fireworks.ai
05Together AI Serverless Models servicePaid APIRDR80Matched vision-language, vision language, multimodal; 2 source links; official inference catalog signal; access model: Paid APIdocs.together.ai
06Paying More Attention to Visual Tokens in Self-Evolving Large Multimodal ModelspaperResearch-onlyRDR75Matched vision-language, vision language, multimodal; 1 source link; access model: Research-only; freshly updatedarxiv.org
07RSICCLLM: A Multimodal Large Language Model for Remote Sensing Image Change CaptioningpaperResearch-onlyRDR75Matched vision-language, vision language, multimodal; 2 source links; access model: Research-only; freshly updatedarxiv.org