Tencent / Freeze-OmniLinks
The official implement of Freeze-Omni.
☆13Updated 7 months ago
Alternatives and similar repositories for Freeze-Omni
Users that are interested in Freeze-Omni are comparing it to the libraries listed below
Sorting:
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆103Updated 6 months ago
- ☆163Updated 4 months ago
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆42Updated 2 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆34Updated last year
- ☆48Updated last month
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆198Updated 3 months ago
- This repo is an exploratory experiment to enable frozen pretrained RWKV language models to accept speech modality input. We followed the …☆52Updated 6 months ago
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆77Updated this week
- llama-omni训练代码复现☆65Updated 5 months ago
- flow mirror models from JZX AI Labs☆44Updated 8 months ago
- The official implement of VITA, VITA15, LongVITA, and VITA-Audio.☆32Updated last month
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".☆24Updated 11 months ago
- A Survey of Spoken Dialogue Models (60 pages)☆304Updated 7 months ago
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆322Updated last month
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆62Updated 3 weeks ago
- ☆120Updated last month
- ☆84Updated 3 weeks ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆38Updated 3 months ago
- ☆72Updated last month
- trying to reproduce suno v3☆33Updated 4 months ago
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆68Updated 2 weeks ago
- Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…☆50Updated this week
- ☆13Updated last year
- Open-Pandora: On-the-fly Control Video Generation☆34Updated 6 months ago
- ☆38Updated 10 months ago
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆267Updated 5 months ago
- Official release of StyleTalk dataset.☆67Updated 11 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆37Updated last year
- An easy-to-use, fast, and easily integrable tool for evaluating audio LLM☆115Updated 3 weeks ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆271Updated 2 months ago