video-db / ocr-benchmarkLinks

Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments

☆43

Alternatives and similar repositories for ocr-benchmark

Users that are interested in ocr-benchmark are comparing it to the libraries listed below

Sorting:

naklecha / replicate-local
Retrieve the source code for any model made available on replicate.com!
☆34Updated last year
SouthBridgeAI / llm-transcription-study
Useful resources for LLM-based Diarization and Transcription.
☆55Updated 9 months ago
Extensible-AI / DAGent
Build AI Agents with Your Existing Python Code!
☆63Updated 9 months ago
impel-intelligence / dippy-bittensor-subnet
☆55Updated 3 weeks ago
tensoic / Cerule
Cerule - A Tiny Mighty Vision Model
☆66Updated 11 months ago
swyxio / openlangmem
☆47Updated last year
multimodalart / grog
Gradio UI for a Cog API
☆69Updated last year
okaris / grounded-segmentation
A streamlined implementation of Grounding DINO and SAM for advanced image segmentation. This lightweight solution simplifies the integrat…
☆64Updated 10 months ago
QuixiAI / bridge-protocol
☆22Updated 2 months ago
QuixiAI / dolphin-logger
☆102Updated last month
ANTONIOPSD / CaptionIMG
Simple program to manually caption your images (or any other file types) so you can use them for AI training
☆37Updated 2 years ago
teknium1 / ShareGPT-Builder
☆116Updated 7 months ago
omkaark / agenata
Build Web Datasets with Ease
☆33Updated last year
nateraw / openai-vision-api-for-videos
Extract information, summarize, ask questions, and search videos using OpenAI's Vision API 🚀🎦
☆63Updated last year
yoheinakajima / autofinetune
auto fine tune of models with synthetic data
☆76Updated last year
kyegomez / NeoSapiens
The next evolution of Agents
☆48Updated 2 weeks ago
ai8hyf / OpenResearchAssistant
An automated tool for discovering insights from research papaer corpora
☆138Updated last year
WismutHansen / READ2ME
Turn text from websites into spoken audio with edge-tts, F5, etc. and save as mp3 files
☆47Updated last month
mshumer / portal
☆30Updated 8 months ago
parsakhaz / gaze-detection-video
Use the Moondream 2 model to detect faces and their gaze directions in videos.
☆44Updated 6 months ago
enjalot / latent-data-modal
Using modal.com to process FineWeb-edu data
☆20Updated 4 months ago
huggingface / discord-bots
☆50Updated last year
shubcodes / fireworksai-browseruse
A powerful AI agent for browser-based interactions powered by Fireworks AI models. Navigate the web, extract content, analyze websites, a…
☆36Updated 2 months ago
GrantCuster / gemini-spatial-example
How to use bounding boxes with the Gemini API
☆104Updated last year
ritabratamaiti / AnyModal
AnyModal is a Flexible Multimodal Language Model Framework for PyTorch
☆101Updated 7 months ago
smolorg / smoltropix
MLX port for xjdr's entropix sampler (mimics jax implementation)
☆62Updated 9 months ago
modal-labs / awesome-modal
A curated list of amazingly awesome Modal applications, demos, and shiny things. Inspired by awesome-php.
☆150Updated last month
NSTiwari / Sketch2Vid
This repository is an implementation of converting sketches into lively videos using Google's Veo 3 model.
☆48Updated last month
julien-blanchon / arxflix
Arxflix turns your boring Arxiv research paper into a captivating video.
☆52Updated this week
JosefAlbers / Phi-3-Vision-MLX
Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon
☆271Updated 11 months ago