Best Hugging Face Inference Providers alternatives.
Live source-backed alternatives to Hugging Face Inference Providers for Vision-language. Alternatives are selected from the same task category and update whenever the best-of index rebuilds.
Hugging Face Inference Providers
Official Hugging Face Inference Providers catalog for running model API and serverless inference workflows across text, vision, image generation, speech, embedding, and multimodal model tasks. official_inference_catalog llm api model api inference api serverless inference image generation object detection vision-language embedding model speech-to-text text-to-speech text image audio vision embedding multimodal api hosted model hub serverless inference model hub provider routing task catalog developer inference
NVIDIA NIM Model Catalog
Matched vision-language, vision language, multimodal; 3 source links; official inference catalog signal; access model: Free endpoint
A Unified Framework for Efficient Remote Sensing Visual Question Answering: Adapting Dual, Hybrid, and Encoder-Decoder Architectures
Matched vision-language, vision language, vlm; 1 source link; access model: Research-only
| # | Alternative | Kind | Access | Fit | Why it appears | Source |
|---|---|---|---|---|---|---|
| 01 | NVIDIA NIM Model Catalog | service | Free endpoint | RDR84 | Matched vision-language, vision language, multimodal; 3 source links; official inference catalog signal; access model: Free endpoint | build.nvidia.com |
| 02 | A Unified Framework for Efficient Remote Sensing Visual Question Answering: Adapting Dual, Hybrid, and Encoder-Decoder Architectures | paper | Research-only | RDR80 | Matched vision-language, vision language, vlm; 1 source link; access model: Research-only | arxiv.org |
| 03 | Fireworks AI Serverless Models | service | Paid API | RDR80 | Matched vision-language, vision language, multimodal; 2 source links; official inference catalog signal; access model: Paid API | docs.fireworks.ai |
| 04 | Together AI Serverless Models | service | Paid API | RDR80 | Matched vision-language, vision language, multimodal; 2 source links; official inference catalog signal; access model: Paid API | docs.together.ai |
| 05 | Paying More Attention to Visual Tokens in Self-Evolving Large Multimodal Models | paper | Research-only | RDR75 | Matched vision-language, vision language, multimodal; 1 source link; access model: Research-only; freshly updated | arxiv.org |
| 06 | SARLO-80: Worldwide Slant SAR Language Optic Dataset 80cm | paper | Research-only | RDR75 | Matched vision-language, vision language, multimodal; 2 source links; access model: Research-only | arxiv.org |
| 07 | RSICCLLM: A Multimodal Large Language Model for Remote Sensing Image Change Captioning | paper | Research-only | RDR75 | Matched vision-language, vision language, multimodal; 2 source links; access model: Research-only; freshly updated | arxiv.org |