vikhyat / moondreamLinks
tiny vision language model
☆9,303Updated 2 months ago
Alternatives and similar repositories for moondream
Users that are interested in moondream are comparing it to the libraries listed below
Sorting:
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,659Updated 2 weeks ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,557Updated 3 weeks ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,724Updated last year
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆13,137Updated this week
- Everything about the SmolLM and SmolVLM family of models☆3,602Updated 3 weeks ago
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆8,891Updated last year
- A fast multimodal LLM for real-time voice☆4,349Updated last month
- Inference and training library for high-quality TTS models.☆5,528Updated last year
- Fast, flexible LLM inference☆6,508Updated this week
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.☆24,426Updated last year
- A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone☆23,054Updated this week
- Official inference repo for FLUX.1 models☆25,187Updated 6 months ago
- Local AI API Platform☆2,762Updated 7 months ago
- Foundational model for human-like, expressive TTS☆4,191Updated last year
- A vector search SQLite extension that runs anywhere!☆6,858Updated last year
- 【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection☆3,448Updated last year
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆18,477Updated last year
- Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++☆5,372Updated this week
- llama3 implementation one matrix multiplication at a time☆15,241Updated last year
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,624Updated 5 months ago
- Go ahead and axolotl questions☆11,251Updated this week
- High-speed Large Language Model Serving for Local Deployment☆8,635Updated 2 weeks ago
- [CVPR 2024] Real-Time Open-Vocabulary Object Detection☆6,198Updated 11 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆3,119Updated 8 months ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,440Updated 2 months ago
- ☆8,809Updated 3 months ago
- Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"☆3,334Updated last year
- PyTorch code and models for V-JEPA self-supervised learning from video.☆3,499Updated 11 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,461Updated 10 months ago
- CoTracker is a model for tracking any point (pixel) on a video.☆4,820Updated last year