ictnlp / LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
☆891Updated last week
Related projects: ⓘ
- Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"☆742Updated 3 weeks ago
- Whisper with Medusa heads☆774Updated last week
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,069Updated last month
- ☆544Updated this week
- LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs☆1,016Updated 2 weeks ago
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆2,425Updated this week
- WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.☆1,509Updated last month
- A fast multimodal LLM for real-time voice☆847Updated this week
- Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"☆769Updated 2 months ago
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,390Updated 2 months ago
- Reaching LLaMA2 Performance with 0.1M Dollars☆955Updated last month
- A lightweight framework for building LLM-based agents☆1,744Updated 3 weeks ago
- Build real-time multimodal AI applications 🤖🎙️📹☆1,053Updated this week
- ReFT: Representation Finetuning for Language Models☆1,076Updated 2 weeks ago
- Convert Compute And Books Into Instruct-Tuning Datasets (or classifiers)!☆816Updated this week
- To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up t…☆696Updated last week
- ☆1,079Updated 2 months ago
- ✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM☆751Updated this week
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,756Updated last month
- ☆640Updated this week
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,309Updated 5 months ago
- SpeechGPT Series: Speech Large Language Models☆1,219Updated last month
- HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across externa…☆1,237Updated last month
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation☆644Updated last month
- Mora: More like Sora for Generalist Video Generation☆1,474Updated 2 months ago
- VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and…☆1,786Updated last week
- A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.☆655Updated last week
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding☆1,318Updated last week
- ☆419Updated this week
- Local SRT/LLM/TTS Voicechat☆471Updated last month