OpenVFM / V-SWIFTLinks
V-SWIFT: Training a Small VideoMAE Model on a Single Machine in a Day
☆28Updated 6 months ago
Alternatives and similar repositories for V-SWIFT
Users that are interested in V-SWIFT are comparing it to the libraries listed below
Sorting:
- Video Benchmark Suite: Rapid Evaluation of Video Foundation Models☆15Updated 7 months ago
- Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion☆359Updated 4 months ago
- Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆102Updated last month
- Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"☆213Updated last month
- MLCD-Seg is a zero-shot segmentation model from DeepGlint.☆17Updated last month
- YOLO-UniOW: Efficient Universal Open-World Object Detection☆149Updated 6 months ago
- Official repo of Griffon series including v1(ECCV 2024), v2, and G☆228Updated 2 months ago
- ☆21Updated last year
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆198Updated 6 months ago
- Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning☆236Updated 3 weeks ago
- A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space☆88Updated 6 months ago
- [CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception☆579Updated last year
- (CVPR 2025 highlight✨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of La…☆337Updated 2 weeks ago
- [CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".☆160Updated 7 months ago
- The official implement of "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning"☆241Updated last week
- [COLM 2025] Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources☆246Updated 2 months ago
- ☆20Updated 5 months ago
- [CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection☆177Updated 4 months ago
- This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detectio…☆666Updated 2 weeks ago
- Fine tuning grounding Dino☆127Updated last week
- A curated list of papers, datasets and resources pertaining to open vocabulary object detection.☆343Updated 2 months ago
- NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing☆559Updated 9 months ago
- General Vision Benchmark, GV-B, a project from OpenGVLab☆189Updated 3 years ago
- ☆95Updated last year
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆96Updated 9 months ago
- MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka☆313Updated last month
- [ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"☆247Updated 7 months ago
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"☆480Updated last week
- 多模态 MM +Chat 合集☆274Updated 2 months ago
- New generation of CLIP with fine grained discrimination capability, ICML2025☆259Updated last week