video-db / ocr-benchmarkLinks
Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments
☆43Updated 5 months ago
Alternatives and similar repositories for ocr-benchmark
Users that are interested in ocr-benchmark are comparing it to the libraries listed below
Sorting:
- Retrieve the source code for any model made available on replicate.com!☆34Updated last year
- Useful resources for LLM-based Diarization and Transcription.☆55Updated 9 months ago
- Build AI Agents with Your Existing Python Code!☆63Updated 9 months ago
- ☆55Updated 3 weeks ago
- Cerule - A Tiny Mighty Vision Model☆66Updated 11 months ago
- ☆47Updated last year
- Gradio UI for a Cog API☆69Updated last year
- A streamlined implementation of Grounding DINO and SAM for advanced image segmentation. This lightweight solution simplifies the integrat…☆64Updated 10 months ago
- ☆22Updated 2 months ago
- ☆102Updated last month
- Simple program to manually caption your images (or any other file types) so you can use them for AI training☆37Updated 2 years ago
- ☆116Updated 7 months ago
- Build Web Datasets with Ease☆33Updated last year
- Extract information, summarize, ask questions, and search videos using OpenAI's Vision API 🚀🎦☆63Updated last year
- auto fine tune of models with synthetic data☆76Updated last year
- The next evolution of Agents☆48Updated 2 weeks ago
- An automated tool for discovering insights from research papaer corpora☆138Updated last year
- Turn text from websites into spoken audio with edge-tts, F5, etc. and save as mp3 files☆47Updated last month
- ☆30Updated 8 months ago
- Use the Moondream 2 model to detect faces and their gaze directions in videos.☆44Updated 6 months ago
- Using modal.com to process FineWeb-edu data☆20Updated 4 months ago
- ☆50Updated last year
- A powerful AI agent for browser-based interactions powered by Fireworks AI models. Navigate the web, extract content, analyze websites, a…☆36Updated 2 months ago
- How to use bounding boxes with the Gemini API☆104Updated last year
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆101Updated 7 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆62Updated 9 months ago
- A curated list of amazingly awesome Modal applications, demos, and shiny things. Inspired by awesome-php.☆150Updated last month
- This repository is an implementation of converting sketches into lively videos using Google's Veo 3 model.☆48Updated last month
- Arxflix turns your boring Arxiv research paper into a captivating video.☆52Updated this week
- Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon☆271Updated 11 months ago