hhnqqq / py_hfdLinks

A python script for downloading huggingface datasets and models.

☆19

Alternatives and similar repositories for py_hfd

Users that are interested in py_hfd are comparing it to the libraries listed below

Sorting:

yyyybq / Awesome-Spatial-Reasoning
A paper list for spatial reasoning
☆127Updated last month
Video-R1 / Awesome-Multimodal-Reasoning
Collections of Papers and Projects for Multimodal Reasoning.
☆105Updated 3 months ago
Purshow / Awesome-Unified-Multimodal
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
☆268Updated last week
arctanxarc / UniCTokens
A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…
☆109Updated last month
yaolinli / TimeChat-Online
[ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
☆65Updated 3 weeks ago
WayneJin0918 / SOTA-paper-rating.io
A tiny paper rating web
☆39Updated 4 months ago
arctanxarc / MC-LLaVA
Official implementation of MC-LLaVA.
☆130Updated 2 months ago
PzySeere / MetaSpatial
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …
☆162Updated 3 months ago
shiyi-zh0408 / NAE_CVPR2024
Accepted by CVPR 2024
☆37Updated last year
ZJU-REAL / ViewSpatial-Bench
ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models
☆53Updated 2 months ago
Wang-Xiaodong1899 / CVPR25-MLLM-Paper-List
🔥CVPR 2025 Multimodal Large Language Models Paper List
☆149Updated 4 months ago
HKUST-LongGroup / Awesome-MLLM-Benchmarks
☆132Updated 5 months ago
Gabesarch / grounded-rl
☆69Updated 2 weeks ago
www-Ye / Time-R1
R1-like Video-LLM for Temporal Grounding
☆109Updated last month
Cooperx521 / ScaleCap
Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’
☆52Updated last month
NVlabs / Long-RL
Long-RL: Scaling RL to Long Sequences
☆568Updated this week
yu-rp / VisualPerceptionToken
☆93Updated 4 months ago
Gumpest / SparseVLMs
[ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".
☆135Updated 2 months ago
cokeshao / Awesome-Multimodal-Token-Compression
Survey: https://arxiv.org/pdf/2507.20198
☆46Updated last week
dvlab-research / VisionZip
Official repository for VisionZip (CVPR 2025)
☆329Updated 2 weeks ago
jianke0604 / MTLlib
[CVPR’25] PIVRG & ConsMTL
☆12Updated last month
hmxiong / StreamChat
Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025
☆64Updated 4 months ago
ML-GSAI / LLaDA-V
☆188Updated this week
xinyan-cxy / MINT-CoT
☆62Updated last month
PKU-YuanGroup / Look-Back
This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".
☆30Updated 3 weeks ago
linkangheng / Video-UTR
[ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs
☆58Updated 5 months ago
The-Martyr / Awesome-Multimodal-Reasoning
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models
☆32Updated this week
Wild-Cooperation-Hub / Awesome-MLLM-Reasoning-Benchmarks
A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.
☆68Updated 4 months ago
Wang-Xiaodong1899 / Open-R1-Video
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆352Updated 5 months ago
ML-GSAI / Diffusion-LLM-Papers
A Collection of Papers on Diffusion Language Models
☆97Updated last month