video-db / ocr-benchmarkLinks
Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments
☆47Updated 11 months ago
Alternatives and similar repositories for ocr-benchmark
Users that are interested in ocr-benchmark are comparing it to the libraries listed below
Sorting:
- Simple program to manually caption your images (or any other file types) so you can use them for AI training☆36Updated 2 years ago
- Useful resources for LLM-based Diarization and Transcription.☆55Updated last year
- ☆107Updated 3 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆61Updated last year
- Build AI Agents with Your Existing Python Code!☆69Updated last year
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆103Updated last year
- Turn text from websites into spoken audio with edge-tts, F5, etc. and save as mp3 files☆46Updated 7 months ago
- Extract information, summarize, ask questions, and search videos using OpenAI's Vision API 🚀🎦☆62Updated 2 years ago
- ☆55Updated 5 months ago
- ☆74Updated last year
- Gradio UI for a Cog API☆70Updated last year
- ☆90Updated 3 months ago
- ☆47Updated last year
- Using the moondream VLM with optical flow for promptable object tracking☆73Updated 11 months ago
- GRDN.AI app for garden optimization☆69Updated 2 months ago
- Modify Entropy Based Sampling to work with Mac Silicon via MLX☆49Updated last year
- ☆119Updated last year
- Retrieve the source code for any model made available on replicate.com!☆36Updated 2 years ago
- A streamlined implementation of Grounding DINO and SAM for advanced image segmentation. This lightweight solution simplifies the integrat…☆66Updated last year
- Cerule - A Tiny Mighty Vision Model☆68Updated 3 months ago
- klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs☆86Updated last year
- Lego for GRPO☆30Updated 8 months ago
- Using modal.com to process FineWeb-edu data☆20Updated 10 months ago
- Gradio based tool to run opensource LLM models directly from Huggingface☆97Updated last year
- Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon☆273Updated 3 months ago
- ☆21Updated 8 months ago
- Use the Moondream 2 model to detect faces and their gaze directions in videos.☆46Updated last year
- look how they massacred my boy☆63Updated last year
- The next evolution of Agents☆48Updated last week
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆85Updated last year