Lliar-liar / Daily-OmniLinks
This is the official repository of Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
☆28Updated 5 months ago
Alternatives and similar repositories for Daily-Omni
Users that are interested in Daily-Omni are comparing it to the libraries listed below
Sorting:
- A project for tri-modal LLM benchmarking and instruction tuning.☆53Updated 9 months ago
- ☆76Updated 3 months ago
- Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))☆48Updated last year
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆131Updated 3 months ago
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆85Updated 3 months ago
- Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".☆63Updated last year
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆42Updated 3 months ago
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆32Updated 9 months ago
- ☆127Updated 3 months ago
- ☆31Updated 2 years ago
- Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis☆40Updated 2 years ago
- ☆50Updated 3 weeks ago
- ☆47Updated 8 months ago
- MIO: A Foundation Model on Multimodal Tokens☆32Updated last year
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Updated last year
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Updated 11 months ago
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆87Updated last year
- An official implementation of Style-Talker for Spoken Dialogue Generation☆23Updated 11 months ago
- Code and pretrained models for "DUB: Discrete Unit Back-translation for Speech Translation" (ACL 2023 Findings)☆28Updated 2 years ago
- LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models☆25Updated last year
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆73Updated last year
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆124Updated last year
- Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation☆14Updated 8 months ago
- ☆78Updated 7 months ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆67Updated last year
- The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) arc…☆15Updated 10 months ago
- SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022☆118Updated 3 years ago
- ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.☆62Updated 4 years ago
- Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos☆25Updated last year
- ☆19Updated last year