whn09 / VITALinks
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
☆11Updated 3 weeks ago
Alternatives and similar repositories for VITA
Users that are interested in VITA are comparing it to the libraries listed below
Sorting:
- A collection of optimized utilities for text-to-audio processing, enhancing both training and inference workflows. This repository contai…☆24Updated 2 months ago
- ☆70Updated last month
- An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.☆53Updated this week
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated 9 months ago
- flow mirror models from JZX AI Labs☆44Updated 8 months ago
- ☆13Updated last year
- This is a repository that collects common audio noise reduction models, using Gradio to demonstrate the use of each model, which is very …☆37Updated 6 months ago
- The official GitHub Page for MiniMax☆38Updated last week
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆82Updated last year
- The official repo for paper "Spatial Speech Translation: Translating Across Space With Binaural Hearables"☆62Updated last month
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆75Updated this week
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆46Updated 8 months ago
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆12Updated 4 months ago
- Our 2nd-gen LMM☆33Updated last year
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated 11 months ago
- ☆13Updated last year
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆35Updated 8 months ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 8 months ago
- Music production for silent film clips.☆25Updated last month
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆41Updated last year
- SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime☆94Updated 8 months ago
- ☆32Updated 4 months ago
- Awesome Colab Projects Collection☆26Updated last year
- PodAgent: A Comprehensive Framework for Podcast Generation☆87Updated 3 weeks ago
- Incredibly descriptive audiovisual summaries for videos☆41Updated 10 months ago
- ☆59Updated 10 months ago
- ☆16Updated last month
- ☆24Updated 5 months ago
- Daily tracking of awesome aigc papers, including video generation, video editing, animation.☆22Updated last week
- Fine-tune of Florence-2 for shot categorization.☆24Updated 3 months ago