Tencent / Freeze-OmniLinks
The official implement of Freeze-Omni.
β13Updated last month
Alternatives and similar repositories for Freeze-Omni
Users that are interested in Freeze-Omni are comparing it to the libraries listed below
Sorting:
- β170Updated 6 months ago
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM πβ269Updated 7 months ago
- LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances β¦β75Updated 2 months ago
- A Foundation Model for Industrial Signal Comprehensive Representationβ36Updated 3 weeks ago
- A collection of optimized utilities for text-to-audio processing, enhancing both training and inference workflows. This repository contaiβ¦β39Updated 5 months ago
- OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Modelβ83Updated last month
- β15Updated 8 months ago
- The official implement of VITA, VITA15, LongVITA, and VITA-Audio.β34Updated last month
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interactionβ208Updated 6 months ago
- An easy-to-use, fast, and easily integrable tool for evaluating audio LLMβ135Updated last month
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehensionβ115Updated 8 months ago
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Herβ54Updated 4 months ago
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systemsβ83Updated last year
- π€ R1-AQA Model: mispeech/r1-aqaβ296Updated 5 months ago
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Reaβ¦β96Updated 2 months ago
- β¨β¨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLMβ338Updated 3 months ago
- The WorldRWKV project aims to implement training and inference across various modalities using the RWKV7 architecture. By leveraging diffβ¦β54Updated 3 weeks ago
- β106Updated 3 weeks ago
- β55Updated 2 months ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.β37Updated 11 months ago
- Keras implement of Finite Scalar Quantizationβ81Updated last year
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and videoβ¦β35Updated last year
- A project for tri-modal LLM benchmarking and instruction tuning.β43Updated 5 months ago
- β13Updated last year
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".β25Updated last year
- flow mirror models from JZX AI Labsβ44Updated 11 months ago
- β40Updated last year
- trying to reproduce suno v3β34Updated 7 months ago
- Code for paper "Patch-Level Training for Large Language Models"β86Updated 9 months ago
- A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28Γ speedup.β76Updated 10 months ago