nfsrules / qwen2.5VL-R1View external linksLinks
QWEN 2.5VL-R1: Multimodal reasoning model for action recognition in videos (Experimental GRPO with LoRA support)
☆21Oct 9, 2025Updated 4 months ago
Alternatives and similar repositories for qwen2.5VL-R1
Users that are interested in qwen2.5VL-R1 are comparing it to the libraries listed below
Sorting:
- Pytorch implementation of (2+1)D spatiotemporal convolutions☆12Sep 13, 2018Updated 7 years ago
- [ICCV2023] PyTorch implementation of ''Spatial-Aware Token for Weakly Supervised Object Localization''.☆23Oct 24, 2023Updated 2 years ago
- [NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding☆73Dec 14, 2025Updated 2 months ago
- This is the official implementation for our NeurIPS 2023 paper "Focus on Query: Adversarial Mining Transformer for Few-Shot Segmentation"…☆22Mar 26, 2024Updated last year
- [CVPR'24] Code for Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models☆18Jul 22, 2024Updated last year
- The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…☆12Oct 14, 2024Updated last year
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- ☆14Aug 10, 2025Updated 6 months ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- The repository of VG-Refiner paper☆17Dec 9, 2025Updated 2 months ago
- ☆42May 24, 2024Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆42Mar 11, 2025Updated 11 months ago
- CLIP-based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-modal Hashing Retrieval☆10Mar 18, 2024Updated last year
- ☆10Apr 7, 2025Updated 10 months ago
- Part of a research scholarship. I built a basic 2d driving sim with simulated lidar data to train Deep Q Neural Network. So far after abo…☆11Feb 15, 2017Updated 8 years ago
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- NLP on Korean news articles. Automatic topic extraction through dynamic clustering.☆12Sep 15, 2017Updated 8 years ago
- ☆11Jan 18, 2025Updated last year
- Getting Started in Imitation Learning☆13Mar 3, 2025Updated 11 months ago
- 海思设备上部署阉割版yolov5☆13Nov 22, 2021Updated 4 years ago
- A Kivy tutorial for PyOhio 2013☆14Apr 30, 2014Updated 11 years ago
- This is a tool that can make you run intel openVINO Demos and samples easily.☆11Jan 31, 2023Updated 3 years ago
- The ONNX Model Zoo is a collection of pre-trained models for state of the art models in deep learning, available in the ONNX format☆39Jul 27, 2018Updated 7 years ago
- Face detection model zoo☆42Apr 23, 2018Updated 7 years ago
- A highly commented Tensorflow implementation of DCGAN and WGAN for images.☆10Dec 22, 2017Updated 8 years ago
- ☆24Nov 27, 2025Updated 2 months ago
- Using the .mlmodel in Xcode, that .mlmodel is converted from Keras / TensorFlow output. Please check https://github.com/ashislaha/CarDete…☆11Oct 16, 2017Updated 8 years ago
- Deep Q-Networks in tensorflow☆10Apr 4, 2017Updated 8 years ago
- Face recognition using Siamese Networks☆12Nov 29, 2017Updated 8 years ago
- Fixed version of https://github.com/tomguluson92/PRNet_PyTorch☆10Mar 30, 2020Updated 5 years ago
- MandelBulb rendered as a Point Cloud for IOS, uses Swift and Metal☆13May 31, 2021Updated 4 years ago
- Official repository of paper "LOVE-R1: Advancing Long Video Understanding with Adaptive Zoom-in Mechanism via Multi-Step Reasoning"☆20Nov 1, 2025Updated 3 months ago
- Helmet Detector based on the CenterNet.☆11Jan 30, 2022Updated 4 years ago
- Your virtual companian/waifu powered by chatgpt and other state-of-the-art AI models☆11Sep 11, 2023Updated 2 years ago
- Official Implementation for ACM MM2024 paper "VrdONE: One-stage Video Visual Relation Detection".☆11Nov 13, 2024Updated last year
- ☆10Feb 26, 2020Updated 5 years ago
- Code for the paper: Graph Jigsaw Learning for Cartoon Face Recognition☆10Jul 1, 2022Updated 3 years ago
- ROS package for SOTA Computer Vision Models including SAM, Cutie, GroundingDINO, YOLO-World, VLPart, DEVA and MaskDINO.☆51Aug 4, 2024Updated last year
- Create 3D point clouds from depth images captured with the lens blur feature of the Google Camera app for Android.☆19Apr 26, 2014Updated 11 years ago