LIVE-Last scan updating-53 sources active-129 signals today-RESEARCH PGaussDet: Open-Vocabulary and Referring Segmentation for 3D Gaussians Using 2D Detectors
Automated alternatives

Best DiffusionBench: On Holistic Evaluation of Diffusion Transformers alternatives.

Live source-backed alternatives to DiffusionBench: On Holistic Evaluation of Diffusion Transformers for Image generation. Alternatives are selected from the same task category and update whenever the best-of index rebuilds.

Alternatives
7
same task category
Sources
14
distinct URLs
Modules
6
indexable
Updated
Jun 29, 2026
from radar data
Reference option

DiffusionBench: On Holistic Evaluation of Diffusion Transformers

Diffusion transformer (DiT) research on image generation has converged to a single evaluation setup: class-conditional generation on ImageNet. While methods improve the FID and related metrics, it is increasingly unclear whether they reflect real progress in generative modeling. The natural alternative, i.e., text-to-image (T2I) generation, is perceived as too costly or inconvenient to train and evaluate and is often skipped. We argue that this perception no longer holds. We introduce NanoGen, a unified DiT training and evaluation framework. NanoGen matches state-of-the-art DiT baselines on ImageNet and, with 12 lines of configuration change, also trains competitive text-to-image models. It currently supports RAE, VAE, pixel-space, and MeanFlow diffusion methods under both ImageNet and T2I setups. Under NanoGen, training T2I requires comparable compute to ImageNet. After training 21 latent diffusion models with NanoGen, we observe that method ranking shows no strong correlation between ImageNet and T2I generation: Pearson correlation is between -0.377 and -0.580 across three metrics. This suggests that a method which improves class-conditional ImageNet FID may show no corresponding improvement on T2I, clearly indicating the necessity of evaluating DiTs on both tasks. To this end, we summarize ImageNet and text-to-image results, which yields DiffusionBench, a holistic benchmark for DiT research. We recommend reporting DiffusionBench in place of ImageNet alone: methods that improve DiffusionBench are more likely to reflect broader progress. cs.CV Diffusion transformer (DiT) research on image generation has converged to a single evaluation setup: class-conditional generation on ImageNet. While methods improve the FID and related metrics, it is increasingly unclear whether they reflect real progress in generative modeling. The natural alternative, i.e., text-to-image (T2I) generation, is perceived as too costly or inconvenient to train and evaluate and is often skipped. We argue that this perception no longer holds. We introduce NanoGen, a unified DiT training and evaluation framework. NanoGen matches state-of-the-art DiT baselines on ImageNet and, with 12 lines of configuration change, also trains competitive text-to-image models. It currently supports RAE, VAE, pixel-space, and MeanFlow diffusion methods under both ImageNet and T2I setups. Under NanoGen, training T2I requires comparable compute to ImageNet. After training 21 latent diffusion models with NanoGen, we observe that method ranking shows no strong correlation between ImageNet and T2I generation: Pearson correlation is between -0.377 and -0.580 across three metrics. This suggests that a method which improves class-conditional ImageNet FID may show no corresponding improvement on T2I, clearly indicating the necessity of evaluating DiTs on both tasks. To this end, we summarize ImageNet and text-to-image results, which yields DiffusionBench, a holistic benchmark for DiT research. We recommend reporting DiffusionBench in place of ImageNet alone: methods that improve DiffusionBench are more likely to reflect broader progress. Research signal collected from arXiv metadata; Gemini enrichment can add a clearer summary. cs.CV benchmark eval evaluation

RDR79Research-onlyarxiv-ai
Alternative

Replicate Official Models

Matched image generation, text-to-image, text to image; 3 source links; official model catalog signal; access model: Paid API

RDR89Paid API
Alternative

fal Model APIs

Matched image generation, text-to-image, text to image; 3 source links; official model catalog signal; access model: Paid API

RDR88Paid API
#AlternativeKindAccessFitWhy it appearsSource
01Replicate Official Models servicePaid APIRDR89Matched image generation, text-to-image, text to image; 3 source links; official model catalog signal; access model: Paid APIreplicate.com
02fal Model APIs servicePaid APIRDR88Matched image generation, text-to-image, text to image; 3 source links; official model catalog signal; access model: Paid APIfal.ai
03Runware Model API servicePaid APIRDR87Matched image generation, text-to-image, text to image; 3 source links; official model catalog signal; access model: Paid APIrunware.ai
04artokun/comfyui-mcprepoOpen sourceRDR78Matched image generation, text-to-image, text to image; 1 source link; access model: Open source; freshly updatedgithub.com
05Intermediate Text Representation Guided Text-to-Image Generation for Enhancing One-and-Only AlignmentpaperResearch-onlyRDR77Matched image generation, text-to-image, text to image; 1 source link; access model: Research-only; freshly updatedarxiv.org
06aqm857886159/NomirepoOpen sourceRDR77Matched image generation, text-to-image, text to image; 1 source link; access model: Open source; freshly updatedgithub.com
07SeFi-Image: A Text-to-Image Foundation Model with Semantic-First DiffusionpaperResearch-onlyRDR75Matched image generation, text-to-image, text to image; 1 source link; access model: Research-onlyarxiv.org