vikhyat / moondreamLinks
tiny vision language model
☆9,278Updated 2 months ago
Alternatives and similar repositories for moondream
Users that are interested in moondream are comparing it to the libraries listed below
Sorting:
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,463Updated last week
- Fast and accurate automatic speech recognition (ASR) for edge devices☆3,115Updated 2 months ago
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆13,126Updated last week
- Inference and training library for high-quality TTS models.☆5,513Updated last year
- Clarity AI | AI Image Upscaler & Enhancer - free and open-source Magnific Alternative☆4,969Updated 10 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,458Updated 10 months ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,192Updated this week
- An Open Source text-to-speech system built by inverting Whisper.☆4,551Updated last month
- LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a ch…☆5,964Updated last month
- ☆8,785Updated 3 months ago
- Everything about the SmolLM and SmolVLM family of models☆3,579Updated 2 weeks ago
- Large Action Model framework to develop AI Web Agents☆6,284Updated last year
- Foundational model for human-like, expressive TTS☆4,192Updated last year
- A fast multimodal LLM for real-time voice☆4,334Updated last month
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,624Updated 4 months ago
- Local AI API Platform☆2,761Updated 6 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆3,118Updated 8 months ago
- Retrieval Augmented Generation (RAG) chatbot powered by Weaviate☆7,540Updated 6 months ago
- ☆3,070Updated 2 months ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆4,278Updated 9 months ago
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,657Updated this week
- Open Source framework for voice and multimodal conversational AI☆10,078Updated this week
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆6,144Updated last year
- Local realtime voice AI☆2,424Updated 2 months ago
- Fast, flexible LLM inference☆6,449Updated this week
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆3,154Updated this week
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,734Updated 2 months ago
- Go ahead and axolotl questions☆11,171Updated this week
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,030Updated last year
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆7,160Updated last year