Awesome Diffusion Llm
| Entity Passport | |
| Registry ID | gh-tool--aidaslab--awesome-diffusion-llm |
| Provider | github |
Cite this model
Academic & Research Attribution
@misc{gh_tool__aidaslab__awesome_diffusion_llm,
author = {AIDASLab},
title = {Awesome Diffusion Llm Model},
year = {2026},
howpublished = {\url{https://github.com/AIDASLab/Awesome-Diffusion-LLM}},
note = {Accessed via Free2AITools Knowledge Fortress}
} đŦTechnical Deep Dive
Full Specifications [+]âž
Quick Commands
git clone https://github.com/AIDASLab/Awesome-Diffusion-LLM âī¸ Free2AITools Nexus Index V2.0
đŦ Index Insight
FNI V2.0 for Awesome Diffusion Llm: Semantic (S:50), Authority (A:0), Popularity (P:44), Recency (R:98), Quality (Q:70).
Verification Authority
đ What's Next?
Technical Deep Dive
Awesome-Large-Language-Diffusion-Models
A comprehensive and structured list of research papers about Large-Language-Diffusion-Models (dLLMs).
Last major update: April 2026 â added latest 2026 arxiv works, NeurIPS 2025 spotlights, ICLR 2026 papers, frontier-scale dLLMs (LLaDA2.0/2.1, Dream-VLA, etc.), and a new section for Agentic / Tool-Use behavior.
âī¸ Framework (Taxonomy)
- Surveys & Useful Resources
- Core Methodologies
- Reasoning & Policy Optimization
- Token Ordering & Generation Strategies
- System Efficiency & Acceleration
- Multi-modal & Physical AI
- Agentic & Tool-Use dLLMs
- Theory, Guidance & Applications
- Seminal Diffusion Papers
1. Surveys & Useful Resources
đ Blogs & Reports
- Gemini Diffusion
- Mercury (Inception Labs)
- Dream-7B
- DreamOn
- LLaDA2.X (InclusionAI / Ant Group)
- W1-4B-dLLM (Whaletech AI) (Demo)
- What are Diffusion Language Models? (Lilian Weng)
- Generative Modeling by Estimating Gradients (Yang Song)
đ Survey & Perspective Papers
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Diffusion Models for Non-autoregressive Text Generation: A Survey | 2023.03 | IJCAI | Early NAR-text survey |
| A Survey of Diffusion Models in NLP | 2023.05 | Arxiv | Early NLP survey |
| Discrete Diffusion in Large Language and Multimodal Models: A Survey | 2025.06 | Arxiv | dLLM + dMLLM |
| Diffusion-based Large Language Models Survey | 2025.08 | TechRxiv | - |
| A Survey on Parallel Text Generation: From Parallel Decoding to Diffusion Language Models | 2025.08 | Arxiv | - |
| A Survey on Diffusion Language Models | 2025.08 | Arxiv | VILA-Lab; comprehensive |
| Efficient Diffusion Language Models: A Comprehensive Survey | 2026.01 | Efficiency-focused | |
| Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants | 2026.01 | Arxiv | Perspective / roadmap |
2. Core Methodologies
2.1 Discrete & Masked Diffusion
2.2 Continuous & Latent Space Diffusion
2.3 AR-to-Diffusion Adaptation
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Scaling Diffusion Language Models via Adaptation from Autoregressive Models (DiffuLLM) | 2024.10 | ICLR | >7B, GPT2/LLaMA2 Adaptation |
| Large Language Models to Diffusion Finetuning | 2025.01 | ICML | >7B |
| TESS 2: A Large-Scale Generalist Diffusion Language Model | 2025.02 | ACL | >7B, Adapted from Mistral |
| SDAR: A Synergistic Diffusion-AutoRegression Paradigm | 2025.10 | Arxiv | >7B, Qwen3-based BD |
| From Next-Token to Next-Block: Principled Adaptation Path | 2025.11 | Arxiv | >7B, Adaptation Path |
| Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed | 2025.12 | Arxiv | >7B |
| LLaDA2.0: Scaling Up Diffusion Language Models to 100B | 2025.12 | Arxiv | >7B, ARâdLLM at 100B |
2.4 Hybrid AR-Diffusion (Block / Forcing)
A new section: hybrids that interleave block-level AR with intra-block diffusion, or "forcing" approaches that retain causal masks for KV-cache reuse.
3. Reasoning & Policy Optimization
3.1 Reasoning & Planning
3.2 Alignment & Reinforcement Learning
4. Token Ordering & Generation Strategies
5. System Efficiency & Acceleration
5.1 Caching & Memory Strategy
5.2 Decoding & Sampling
5.3 Distillation, Quantization & Sparsity
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Beyond Autoregression: Fast LLMs via Self-Distillation Through Time | 2024.10 | ICLR | <7B, Distillation |
| Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction | 2025.08 | Arxiv | >7B, Sparsity |
| DLLMQuant: Quantizing Diffusion-based Large Language Models | 2025.08 | Arxiv | >7B, Quantization |
| Quantization Meets dLLMs: Post-training Quantization Study | 2025.08 | Arxiv | >7B, Quantization |
| FS-DFM: Few-Step Diffusion Language Model | 2025.09 | Arxiv | >7B |
| SparseD: Sparse Attention for Diffusion Language Models | 2025.09 | Arxiv | >7B, Sparsity |
| LLaDA-MoE: A Sparse MoE Diffusion Language Model | 2025.09 | Arxiv | >7B, MoE |
| Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct | 2025.10 | Arxiv | >7B, Distillation |
| CDLM: Consistency Diffusion Language Models For Faster Sampling | 2025.11 | Arxiv | >7B, Consistency |
5.4 Inference Frameworks & Systems
New section: production-grade frameworks and runtime engineering for dLLMs.
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| dlmserve | 2026.05 | Repo | First OSS serving engine for diffusion LMs (LLaDA family); OpenAI-compatible HTTP, step-level batching (2.5x HF), LocalLeap (1.8x). MIT. |
| FOCUS: DLLMs Know How to Tame Their Compute Bound | 2026.01 | ICML | Training-free inference system; token eviction for higher throughput |
| dInfer: An Efficient Inference Framework for Diffusion Language Models | 2025.10 | Arxiv | Modular framework, >1100 TPS |
| JetEngine (SDAR) | 2025.10 | Repo | Lightweight engine for SDAR (3700+ TPS on H200) |
| Mercury: Ultra-Fast Language Models Based on Diffusion | 2025.06 | Arxiv | Inception Labs commercial dLLM |
| Seed Diffusion: Large-Scale dLLM with High-Speed Inference | 2025.08 | Arxiv | ByteDance code-focused dLLM |
6. Multi-modal & Physical AI
6.1 Multi-modal dLLMs
6.2 Vision-Language-Action (VLA)
Scope note: this section covers VLA models that use a diffusion/masked-diffusion language model as the backbone (dVLM-based VLA) or apply discrete diffusion as the action-decoding mechanism (not continuous diffusion action heads grafted onto an AR VLM). Pure continuous-diffusion-policy VLAs such as DiVLA (Wen et al., 2024), HybridVLA, and ProgressVLA are intentionally excluded because their language model is autoregressive â only the action head is diffusion-based.
(a) dVLM-backbone VLA â language backbone itself is a diffusion language model.
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| LLaDA-VLA: Vision Language Diffusion Action Models | 2025.06 | Arxiv | First LLaDA(d-VLM)-based VLA |
| dVLA: Diffusion VLA with Multimodal Chain-of-Thought | 2025.09 | Arxiv | dLLM backbone + multimodal CoT |
| Dream-VLA: Open Vision-Language-Action Model with Diffusion Backbone | 2025.12 | Arxiv | dVLA from Dream-7B; first dLLM pretrained VLA |
| MMaDA-VLA: Large Diffusion VLA with Unified Multi-Modal Instruction and Generation | 2026.03 | Arxiv | Native discrete-diffusion VLA from MMaDA |
(b) Discrete-diffusion action decoding â language backbone may still be AR-VLM, but action chunks are decoded via discrete diffusion. Closely tied to dLLM literature for inference techniques.
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Discrete Diffusion VLA: Action Decoding in VLA Policies | 2025.08 | Arxiv | Unified-transformer + discrete-diffusion actions |
| E0: Enhancing Generalization and Fine-Grained Control in VLA Models via Tweedie Discrete Diffusion | 2025.11 | Arxiv | AR-VLM backbone + Tweedie discrete diffusion on action tokens |
6.3 Autonomous Driving / World Models
Scope note: works that apply discrete diffusion / masked-diffusion language modeling to driving trajectories, action codebooks, or tokenized world states. Continuous trajectory-diffusion planners (e.g., classical Diffusion Policy applied to driving) are out of scope.
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion | 2023.11 | ICLR | Discrete diffusion on tokenized point-cloud world model |
| ReflectDrive: Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving | 2025.09 | Arxiv | dLLM finetuned on discretized 2D driving space |
| Efficient and Explainable End-to-End Autonomous Driving via Masked Vision-Language-Action Diffusion | 2026.02 | Arxiv | Discrete action codebook + masked diffusion |
7. Agentic & Tool-Use dLLMs
New section â emerging line: how dLLMs behave as agents (planning, multi-turn, tool calling). Critical for connecting dLLMs to robotics and physical-AI agent stacks.
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check | 2026.01 | Arxiv | Embodied + tool-call eval |
| Agents of Diffusion: Enhancing Diffusion Language Models with Multi-Agent Reinforcement Learning for Structured Data Generation | 2026.01 | Arxiv | Multi-agent RL |
| DLLM Agent: See Farther, Run Faster | 2026.02 | Arxiv | dLLM-as-agent comparison |
8. Theory, Guidance & Applications
8.1 Theory & Analysis
8.2 Guidance & Downstream Applications
9. Seminal Diffusion Papers
| Paper Title | Year | Venue | Remark |
|---|---|---|---|
| Deep Unsupervised Learning using Nonequilibrium Thermodynamics | 2015.03 | ICML | Formulation |
| Denoising Diffusion Probabilistic Models (DDPM) | 2020.06 | NeurIPS | - |
| Denoising Diffusion Implicit Models (DDIM) | 2020.10 | ICLR | - |
| Score-Based Generative Modeling through SDEs | 2020.11 | ICLR | - |
| Diffusion Models Beat GANs on Image Synthesis | 2021.05 | NeurIPS | CG |
| Structured Denoising Diffusion in Discrete State-Spaces (D3PM) | 2021.07 | NeurIPS | Discrete |
| Vector Quantized Diffusion Model (VQ-Diffusion) | 2021.11 | CVPR | VQ |
| High-Resolution Image Synthesis with Latent Diffusion (LDM) | 2021.12 | CVPR | - |
| Progressive Distillation for Fast Sampling | 2022.02 | ICLR | Distillation |
| DPM-Solver: Fast ODE Solver for Sampling | 2022.06 | NeurIPS | - |
| Classifier-Free Diffusion Guidance | 2022.07 | NeurIPS | CFG |
| Analog Bits: Generating Discrete Data using Diffusion | 2022.08 | ICLR | Self-conditioning |
| Scalable Diffusion Models with Transformers (DiT) | 2022.12 | ICCV | Scalable focus |
| Consistency Models | 2023.03 | ICML | - |
đ¤ Contact
- Maintainers: [email protected] / [email protected]
- Contributions via Pull Requests are welcome!
â ī¸ Incomplete Data
Some information about this model is not available. Use with Caution - Verify details from the original source before relying on this data.
View Original Source âđ Limitations & Considerations
- âĸ Benchmark scores may vary based on evaluation methodology and hardware configuration.
- âĸ VRAM requirements are estimates; actual usage depends on quantization and batch size.
- âĸ FNI scores are relative rankings and may change as new models are added.
- â License Unknown: Verify licensing terms before commercial use.
Social Proof
AI Summary: Based on GitHub metadata. Not a recommendation.
đĄī¸ Model Transparency Report
Technical metadata sourced from upstream repositories.
đ Identity & Source
- id
- gh-tool--aidaslab--awesome-diffusion-llm
- slug
- aidaslab--awesome-diffusion-llm
- source
- github
- author
- AIDASLab
- license
- tags
- diffusion, diffusion-language-models, diffusion-model, diffusion-models, large-language-model, large-language-models
âī¸ Technical Specs
- architecture
- null
- params billions
- null
- context length
- null
- pipeline tag
- other
đ Engagement & Metrics
- downloads
- 0
- stars
- 0
- forks
- 13
Data indexed from public sources. Updated daily.