nfsrules / qwen2.5VL-R1
View external linksLinks

QWEN 2.5VL-R1: Multimodal reasoning model for action recognition in videos (Experimental GRPO with LoRA support)

☆21

Alternatives and similar repositories for qwen2.5VL-R1

Users that are interested in qwen2.5VL-R1 are comparing it to the libraries listed below

Sorting:

proceduralia / pytorch-conv2_1d
View on GitHub
Pytorch implementation of (2+1)D spatiotemporal convolutions
☆12Sep 13, 2018Updated 7 years ago
wpy1999 / SAT
View on GitHub
[ICCV2023] PyTorch implementation of ''Spatial-Aware Token for Weakly Supervised Object Localization''.
☆23Oct 24, 2023Updated 2 years ago
xiaomi-research / time-r1
View on GitHub
[NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
☆73Dec 14, 2025Updated 2 months ago
Wyxdm / AMNet
View on GitHub
This is the official implementation for our NeurIPS 2023 paper "Focus on Query: Adversarial Mining Transformer for Few-Shot Segmentation"…
☆22Mar 26, 2024Updated last year
letitiabanana / PnP-OVSS
View on GitHub
[CVPR'24] Code for Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
☆18Jul 22, 2024Updated last year
Zhao-Jianing-SUDA / Hawkeye
View on GitHub
The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…
☆12Oct 14, 2024Updated last year
josephzpng / DisTime
View on GitHub
DisTime: Distribution-based Time Representation for Video Large Language Models.
☆18Jul 10, 2025Updated 7 months ago
Eterwait / Echo
View on GitHub
☆14Aug 10, 2025Updated 6 months ago
gabfstr / DiffusionTrack
View on GitHub
Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking
☆13Apr 12, 2023Updated 2 years ago
VoyageWang / VG-Refiner
View on GitHub
The repository of VG-Refiner paper
☆17Dec 9, 2025Updated 2 months ago
NMS05 / DinoV2-SigLIP-Phi3-LoRA-VLM
View on GitHub
☆42May 24, 2024Updated last year
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆42Mar 11, 2025Updated 11 months ago
AwakerLee / CAGAN
View on GitHub
CLIP-based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-modal Hashing Retrieval
☆10Mar 18, 2024Updated last year
mbzuai-oryx / TrackingMeetsLMM
View on GitHub
☆10Apr 7, 2025Updated 10 months ago
tall-josh / CarSimRL
View on GitHub
Part of a research scholarship. I built a basic 2d driving sim with simulated lidar data to train Deep Q Neural Network. So far after abo…
☆11Feb 15, 2017Updated 8 years ago
LinfengYuan1997 / LoSh
View on GitHub
[CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
☆13Jun 17, 2024Updated last year
inhorha5 / Korean-NLP-Project
View on GitHub
NLP on Korean news articles. Automatic topic extraction through dynamic clustering.
☆12Sep 15, 2017Updated 8 years ago
LinglingCai0314 / FreeMask
View on GitHub
☆11Jan 18, 2025Updated last year
AssistiveRoboticsUNH / bc_tutorial
View on GitHub
Getting Started in Imitation Learning
☆13Mar 3, 2025Updated 11 months ago
Hsdxm / hisi-yolov5
View on GitHub
海思设备上部署阉割版yolov5
☆13Nov 22, 2021Updated 4 years ago
brousch / pyohio-kivy-tutorial
View on GitHub
A Kivy tutorial for PyOhio 2013
☆14Apr 30, 2014Updated 11 years ago
henry1758f / OpenVINO_Demo_Kit
View on GitHub
This is a tool that can make you run intel openVINO Demos and samples easily.
☆11Jan 31, 2023Updated 3 years ago
lupesko / model-zoo-old
View on GitHub
The ONNX Model Zoo is a collection of pre-trained models for state of the art models in deep learning, available in the ONNX format
☆39Jul 27, 2018Updated 7 years ago
the-house-of-black-and-white / hall-of-faces
View on GitHub
Face detection model zoo
☆42Apr 23, 2018Updated 7 years ago
ReidWilliams / GANs
View on GitHub
A highly commented Tensorflow implementation of DCGAN and WGAN for images.
☆10Dec 22, 2017Updated 8 years ago
rtous / lester
View on GitHub
☆24Nov 27, 2025Updated 2 months ago
ashislaha / CarDetection-iOS
View on GitHub
Using the .mlmodel in Xcode, that .mlmodel is converted from Keras / TensorFlow output. Please check https://github.com/ashislaha/CarDete…
☆11Oct 16, 2017Updated 8 years ago
pikinder / DQN
View on GitHub
Deep Q-Networks in tensorflow
☆10Apr 4, 2017Updated 8 years ago
Ashutosh18 / Pytorch-Face-Recognition
View on GitHub
Face recognition using Siamese Networks
☆12Nov 29, 2017Updated 8 years ago
Smorodov / PRNet_PyTorch_v2
View on GitHub
Fixed version of https://github.com/tomguluson92/PRNet_PyTorch
☆10Mar 30, 2020Updated 5 years ago
Kosalos / MandelBulbQuad
View on GitHub
MandelBulb rendered as a Point Cloud for IOS, uses Swift and Metal
☆13May 31, 2021Updated 4 years ago
HumanMLLM / LOVE-R1
View on GitHub
Official repository of paper "LOVE-R1: Advancing Long Video Understanding with Adaptive Zoom-in Mechanism via Multi-Step Reasoning"
☆20Nov 1, 2025Updated 3 months ago
tyui592 / Real_Time_Helmet_Detection
View on GitHub
Helmet Detector based on the CenterNet.
☆11Jan 30, 2022Updated 4 years ago
SynodicMonth / ChatWaifu
View on GitHub
Your virtual companian/waifu powered by chatgpt and other state-of-the-art AI models
☆11Sep 11, 2023Updated 2 years ago
lucaspk512 / vrdone
View on GitHub
Official Implementation for ACM MM2024 paper "VrdONE: One-stage Video Visual Relation Detection".
☆11Nov 13, 2024Updated last year
wuchaodzxx / tensorrt_retinaface
View on GitHub
☆10Feb 26, 2020Updated 5 years ago
mysee1989 / GraphJigsaw
View on GitHub
Code for the paper: Graph Jigsaw Learning for Cartoon Face Recognition
☆10Jul 1, 2022Updated 3 years ago
ojh6404 / deep_vision_ros
View on GitHub
ROS package for SOTA Computer Vision Models including SAM, Cutie, GroundingDINO, YOLO-World, VLPart, DEVA and MaskDINO.
☆51Aug 4, 2024Updated last year
0xtob / gcam2ply
View on GitHub
Create 3D point clouds from depth images captured with the lens blur feature of the Google Camera app for Android.
☆19Apr 26, 2014Updated 11 years ago

nfsrules / qwen2.5VL-R1View external linksLinks

Alternatives and similar repositories for qwen2.5VL-R1

nfsrules / qwen2.5VL-R1
View external linksLinks