0nutation / SpeechAgents
SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems
☆74Updated 8 months ago
Related projects: ⓘ
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆92Updated 3 weeks ago
- A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset (ACL 2024)☆12Updated last month
- Unofficial implementation of AlpaGasus☆83Updated 11 months ago
- ☆12Updated last month
- FuseAI Project☆75Updated 3 weeks ago
- Llama3.1 learns to Listen☆134Updated this week
- ☆65Updated last year
- flow mirror models from JZX AI Labs☆33Updated this week
- LlamaVoice is a llama-based large voice generation model, providing inference and training ability.☆169Updated 3 weeks ago
- Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b Mo…☆22Updated 2 months ago
- ☆34Updated last week
- ☆61Updated last month
- Official code for the paper: InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews (previo…☆54Updated 3 months ago
- Awesome TTS☆48Updated 3 years ago
- ☆70Updated 6 months ago
- Awesome Colab Projects Collection☆24Updated 8 months ago
- ☆79Updated this week
- Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".☆55Updated last month
- ☆17Updated 11 months ago
- ☆35Updated last year
- This is the official repository for Inheritune.☆89Updated 4 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆35Updated last week
- MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents☆19Updated 3 weeks ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆36Updated 2 months ago
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆64Updated 9 months ago
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated 3 weeks ago
- ☆34Updated 2 months ago
- A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, qwen-vl, phi3-v …☆123Updated last week
- PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion☆45Updated 6 months ago
- Official github repo for E-Eval, a Chinese K12 education evaluation benchmark for LLMs.☆14Updated 7 months ago