video-db / ocr-benchmarkLinks
Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments
☆40Updated 3 months ago
Alternatives and similar repositories for ocr-benchmark
Users that are interested in ocr-benchmark are comparing it to the libraries listed below
Sorting:
- Gradio UI for a Cog API☆66Updated last year
- Useful resources for LLM-based Diarization and Transcription.☆55Updated 7 months ago
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆95Updated 5 months ago
- ☆17Updated 5 months ago
- Using modal.com to process FineWeb-edu data☆20Updated 2 months ago
- Use the Moondream 2 model to detect faces and their gaze directions in videos.☆40Updated 4 months ago
- Embed anything.☆28Updated last year
- A streamlined implementation of Grounding DINO and SAM for advanced image segmentation. This lightweight solution simplifies the integrat…☆64Updated 8 months ago
- ☆47Updated last year
- ☆28Updated 6 months ago
- Turn text from websites into spoken audio with edge-tts, F5, etc. and save as mp3 files☆47Updated this week
- ☆75Updated 2 weeks ago
- ☆21Updated last year
- Lego for GRPO☆28Updated last week
- Modify Entropy Based Sampling to work with Mac Silicon via MLX☆50Updated 7 months ago
- ☆20Updated last year
- converts url content into JSON with a simple prefix☆68Updated last year
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆64Updated 7 months ago
- klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs☆75Updated 8 months ago
- Verbosity control for AI agents☆63Updated last year
- A couple scripts to grab stats from email☆42Updated 8 months ago
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆80Updated last year
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆22Updated 6 months ago
- Pivotal Token Search☆97Updated 3 weeks ago
- A clone of OpenAI's Tokenizer page for HuggingFace Models☆45Updated last year
- The original BabyAGI, updated with LiteLLM and no vector database reliance (csv instead)☆21Updated 8 months ago
- Turns an Airtable base into a WebGL knowledge graph leveraging relational columns☆33Updated last year
- alternative way to calculating self attention☆18Updated last year
- Gradio based tool to run opensource LLM models directly from Huggingface☆91Updated 11 months ago
- ☆22Updated 7 months ago