vikhyat / moondreamLinks
tiny vision language model
☆8,397Updated 3 weeks ago
Alternatives and similar repositories for moondream
Users that are interested in moondream are comparing it to the libraries listed below
Sorting:
- Inference and training library for high-quality TTS models.☆5,411Updated 9 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,370Updated 5 months ago
- A fast multimodal LLM for real-time voice☆4,181Updated last week
- Foundational model for human-like, expressive TTS☆4,160Updated last year
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,897Updated this week
- Blazingly fast LLM inference.☆6,074Updated last week
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,629Updated this week
- Your image is almost there!☆7,661Updated last year
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,864Updated last week
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆4,169Updated 4 months ago
- Letta is the platform for building stateful agents: open AI with advanced memory that can learn and self-improve over time.☆18,356Updated this week
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆12,748Updated this week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆18,509Updated this week
- Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"☆3,314Updated last year
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,659Updated last year
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.☆23,538Updated last year
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆3,091Updated 2 weeks ago
- Local AI API Platform☆2,761Updated 2 months ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆3,947Updated 8 months ago
- Large Action Model framework to develop AI Web Agents☆6,162Updated 7 months ago
- Local realtime voice AI☆2,362Updated 6 months ago
- Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM☆1,447Updated 5 months ago
- A language model programming library.☆5,845Updated 3 months ago
- Large World Model -- Modeling Text and Video with Millions Context☆7,333Updated 10 months ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,528Updated last month
- ☆3,012Updated last year
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,957Updated last year
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,760Updated 8 months ago
- Examples in the MLX framework☆7,813Updated last week
- Open Source framework for voice and multimodal conversational AI☆8,021Updated this week