roboflow / supervision
We write your reusable computer vision tools. π
β26,605Updated last week
Alternatives and similar repositories for supervision
Users that are interested in supervision are comparing it to the libraries listed below
Sorting:
- OCR, layout analysis, reading order, table recognition in 90+ languagesβ17,413Updated this week
- Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! π¦₯β38,856Updated this week
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.β22,540Updated 9 months ago
- π₯ Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.β38,358Updated this week
- tiny vision language modelβ7,952Updated this week
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained modeβ¦β15,508Updated 4 months ago
- Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AIβ21,943Updated last week
- Draw a mockup and generate html for itβ13,491Updated 10 months ago
- Build and share delightful machine learning apps, all in Python. π Star to support our work!β38,130Updated this week
- ππ€ Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyNβ43,381Updated this week
- Official inference framework for 1-bit LLMsβ19,676Updated this week
- Build AI Agents, Visuallyβ38,349Updated this week
- π OpenHands: Code Less, Make Moreβ54,349Updated this week
- User-friendly AI Interface (Supports Ollama, OpenAI API, ...)β95,083Updated this week
- Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.β141,065Updated this week
- Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressorβ¦β21,987Updated 2 months ago
- Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)β69,958Updated 3 weeks ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audiβ¦β8,203Updated 2 weeks ago
- Images to inference with no labeling (use foundation models to train supervised models).β2,258Updated last week
- β8,385Updated 11 months ago
- fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a β¦β31,147Updated this week
- Instant voice cloning by MIT and MyShell. Audio foundation model.β32,250Updated last month
- State-of-the-art Machine Learning for the web. Run π€ Transformers directly in your browser, with no need for a server!β13,610Updated last week
- A natural language interface for computersβ59,431Updated 3 weeks ago
- Turn any computer or edge device into a command center for your computer vision projects.β1,679Updated this week
- Self-hosted AI coding assistantβ31,149Updated this week
- Run your own AI cluster at home with everyday devices π±π» π₯οΈββ28,144Updated 2 months ago
- Convert PDF to markdown + JSON quickly with high accuracyβ25,089Updated this week
- The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.β77,172Updated this week
- Automate browser-based workflows with LLMs and Computer Visionβ13,368Updated this week