ZHZisZZ / dllmLinks
Train diffusion large language models with π€ Transformers Trainer
β50Updated last week
Alternatives and similar repositories for dllm
Users that are interested in dllm are comparing it to the libraries listed below
Sorting:
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cuesβ81Updated last week
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"β50Updated 7 months ago
- Official code implementation for the work Preference Alignment with Flow Matching (NeurIPS 2024)β58Updated 10 months ago
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"β29Updated 7 months ago
- A project for tri-modal LLM benchmarking and instruction tuning.β48Updated 6 months ago
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.β95Updated 2 weeks ago
- β111Updated last month
- β61Updated last week
- β48Updated last month
- Official codebase for "Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis" (https://arxiv.org/abs/2312.03491).β127Updated last year
- (NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignβ¦β100Updated 2 weeks ago
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformerβ36Updated 9 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformersβ110Updated 4 months ago
- [Official Implementation] Acoustic Autoregressive Modeling π₯β71Updated last year
- small audio language model for reasoningβ75Updated 5 months ago
- OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Modelβ83Updated 2 months ago
- The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) arcβ¦β16Updated 7 months ago
- β19Updated 3 months ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehensionβ119Updated 9 months ago
- A Foundation Model for Industrial Signal Comprehensive Representationβ43Updated last month
- We introduce the LLAMA1 Test Set, a comprehensive open-domain world knowledge QA dataset for evaluating question-answering systems. We prβ¦β21Updated last year
- Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".β63Updated last year
- β26Updated last month
- A curated list of Video to Audio Generationβ74Updated 3 months ago
- β151Updated 6 months ago
- AudioBERT π’ : Audio Knowledge Augmented Language Model (ICASSP 2025)β41Updated 8 months ago
- β78Updated 5 months ago
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)β71Updated 6 months ago
- β28Updated last month
- β31Updated last year