roboflow / trackersLinks
A unified library for object tracking featuring clean room re-implementations of leading multi-object tracking algorithms
☆1,774Updated this week
Alternatives and similar repositories for trackers
Users that are interested in trackers are comparing it to the libraries listed below
Sorting:
- RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.☆2,269Updated 2 weeks ago
- Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"☆6,847Updated 3 months ago
- YOLOE: Real-Time Seeing Anything☆1,364Updated last month
- This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025☆4,233Updated last month
- Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,179Updated last month
- SpatialLM: Training Large Language Models for Structured Indoor Modeling☆3,395Updated 2 weeks ago
- Create your custom OpenCV algorithms using a user-friendly node editor interface, inspired by Blender and Unreal Engine blueprints! Quic…☆369Updated last month
- About This repository is a curated collection of the most exciting and influential CVPR 2025 papers. 🔥 [Paper + Code + Demo]☆609Updated last week
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,578Updated this week
- computer vision and sports☆4,316Updated last month
- Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025, Highlight)☆719Updated 2 months ago
- Thera: Aliasing-Free Arbitrary-Scale Super-Resolution with Neural Heat Fields☆807Updated last month
- Real-time webcam demo with SmolVLM and llama.cpp server☆3,969Updated last month
- Official Implementation of CVPR24 highlight paper: Matching Anything by Segmenting Anything☆1,305Updated last month
- This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.☆1,092Updated 5 months ago
- DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding☆1,096Updated last week
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,725Updated 3 weeks ago
- Model Activity Visualiser☆506Updated 2 months ago
- D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]☆2,464Updated 2 months ago
- Efficient Track Anything☆571Updated 5 months ago
- ☆403Updated last month
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆1,125Updated 2 months ago
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆1,294Updated 3 weeks ago
- Official Repo for "TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding" [ACL 2025]☆1,309Updated this week
- Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2☆2,347Updated last month
- Real Time Speech Transcription with FastRTC ⚡️and Local Whisper 🤗☆661Updated last week
- Transform PDFs into AI podcasts for engaging on-the-go audio content.☆671Updated 3 weeks ago
- Whereabouts Ascertainment for Low-lying Detectable Objects. The SOTA in FOSS AI for drones!☆1,612Updated 5 months ago
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,220Updated last week
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆3,558Updated this week