vikhyat / moondreamLinks
tiny vision language model
☆9,191Updated last month
Alternatives and similar repositories for moondream
Users that are interested in moondream are comparing it to the libraries listed below
Sorting:
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,652Updated 2 weeks ago
- Inference and training library for high-quality TTS models.☆5,505Updated last year
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,450Updated 9 months ago
- A fast multimodal LLM for real-time voice☆4,309Updated 3 weeks ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,089Updated 2 months ago
- ☆8,675Updated last year
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆13,087Updated this week
- A self-organizing file system with llama 3☆5,703Updated 5 months ago
- Everything about the SmolLM and SmolVLM family of models☆3,539Updated last month
- Retrieval Augmented Generation (RAG) chatbot powered by Weaviate☆7,504Updated 5 months ago
- Fast and accurate automatic speech recognition (ASR) for edge devices☆3,065Updated last month
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,250Updated last month
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆4,266Updated 8 months ago
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆7,125Updated last year
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,622Updated 4 months ago
- Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜☆1,826Updated 2 months ago
- Examples in the MLX framework☆8,120Updated 3 weeks ago
- ☆3,056Updated last month
- Foundational model for human-like, expressive TTS☆4,196Updated last year
- The #1 open-source voice interface for desktop, mobile, and ESP32 chips.☆5,105Updated last year
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆35,755Updated 8 months ago
- Go ahead and axolotl questions☆11,050Updated this week
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆6,110Updated last year
- CoreNet: A library for training deep neural networks☆7,022Updated 3 months ago
- Official inference library for Mistral models☆10,612Updated last month
- LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a ch…☆5,965Updated last month
- Local realtime voice AI☆2,426Updated last month
- Local AI API Platform☆2,763Updated 6 months ago
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆9,609Updated 8 months ago
- Perplexity Inspired Answer Engine☆5,013Updated 6 months ago