VARGPT-family / VARGPT-v1.1
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
β25Updated this week
Alternatives and similar repositories for VARGPT-v1.1:
Users that are interested in VARGPT-v1.1 are comparing it to the libraries listed below
- The dataset and baseline code for Text-to-Audio Grounding (TAG)β42Updated 2 months ago
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"β23Updated last month
- [Official Implementation] Acoustic Autoregressive Modeling π₯β67Updated 7 months ago
- Pytorch implementation for βV2C: Visual Voice Cloningββ32Updated 2 years ago
- β30Updated last year
- β11Updated last year
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Herβ32Updated 2 weeks ago
- β21Updated 6 months ago
- Source code for the paper 'Audio Captioning Transformer'β54Updated 3 years ago
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Reaβ¦β40Updated 2 weeks ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformerβ52Updated 5 months ago
- β11Updated 8 months ago
- β22Updated last year
- Official source codes for the paper: EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.β12Updated 2 months ago
- small audio language model for reasoningβ50Updated last week
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformersβ94Updated 5 months ago
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.β113Updated 3 months ago
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)β32Updated 4 months ago
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modellingβ67Updated 4 months ago
- Implementation of Multi-Source Music Generation with Latent Diffusion.β23Updated 6 months ago
- β35Updated 11 months ago
- β54Updated last week
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Modelβ21Updated 7 months ago
- BLSP-Emo: Towards Empathetic Large Speech-Language Modelsβ43Updated 9 months ago
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025)β19Updated 3 months ago
- β27Updated 6 months ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialβ¦β40Updated 2 months ago
- Official release of StyleTalk dataset.β62Updated 9 months ago
- This repo contains script to download MUSIC dataset from youtubeβ8Updated last year
- A spoken version of the textual story cloze benchmarkβ15Updated last year