MiniMax-AI / MiniMax-AI.github.ioLinks

The official GitHub Page for MiniMax

☆62

Alternatives and similar repositories for MiniMax-AI.github.io

Users that are interested in MiniMax-AI.github.io are comparing it to the libraries listed below

Sorting:

kyutai-labs / moshivis
Kyutai with an "eye"
☆235Updated 10 months ago
TencentARC / AudioStory
AudioStory: Generating Long-Form Narrative Audio with Large Language Models
☆301Updated 4 months ago
chentuochao / Spatial-Speech-Translation
The official repo for paper "Spatial Speech Translation: Translating Across Space With Binaural Hearables"
☆71Updated 5 months ago
Liquid4All / liquid-audio
Liquid Audio - Speech-to-Speech audio models by Liquid AI
☆388Updated 2 weeks ago
google-deepmind / videoprism
Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)
☆348Updated 3 weeks ago
NVlabs / OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
☆631Updated 3 months ago
playht / PlayDiffusion
☆537Updated 4 months ago
zai-org / RealVideo
A real-time streaming conversational video system that transforms text interactions into continuous, high-fidelity video responses using …
☆293Updated last month
EvolvingLMMs-Lab / Aero-1
☆77Updated 9 months ago
jasonppy / VoiceStar
VoiceStar: Robust, Duration-controllable TTS that can Extrapolate
☆307Updated 8 months ago
GAIR-NLP / LiveTalk
☆247Updated last month
kszpxxzmc / ViSAudio
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
☆114Updated last month
showlab / livecc
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
☆407Updated 3 months ago
Doriandarko / MLX-GRPO
A pure MLX-based training pipeline for fine-tuning LLMs using GRPO on Apple Silicon.
☆226Updated 3 months ago
camenduru / Wan2.1-jupyter
☆45Updated 5 months ago
ZihanWang314 / coeCheck
☆19Updated 11 months ago
Marvis-Labs / marvis-tts
☆346Updated 5 months ago
Vyvo-Labs / VyvoTTS
☆245Updated last month
TechxGenus / CursorCore
CursorCore: Assist Programming through Aligning Anything
☆133Updated 11 months ago
AIGeeksGroup / PresentAgent
[EMNLP 2025 Demo] PresentAgent: Multimodal Agent for Presentation Video Generation
☆128Updated 2 months ago
ysharma3501 / LinaCodec
A highly compressive and high-quality neural audio codec for speech models.
☆246Updated 2 weeks ago
AK391 / gemini-gradio
☆94Updated last year
showlab / whisperVideo
Find out who said what in the video.
☆129Updated 2 weeks ago
yannqi / Draw-an-Audio-Code
Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.
☆45Updated last year
univa-agent / univa
Official Code Repo for UniVA: Universal Video Agents
☆332Updated last week
SkyworkAI / skyreels-a3.github.io
project for skyreels-a3
☆78Updated 5 months ago
ByteDance-Seed / Stable-DiffCoder
Stable-DiffCoder is a family of lightweight open-source code DLLMs(diffusion large language models) comprising base and instruct models, …
☆65Updated 2 weeks ago
bigai-nlco / IMTalker
☆147Updated last month
HumanMLLM / HumanOmniV2
☆147Updated 6 months ago
Yuan-ManX / ai-multimodal-timeline
Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D…
☆37Updated last year