hhnqqq / py_hfdLinks
A python script for downloading huggingface datasets and models.
☆19Updated 3 months ago
Alternatives and similar repositories for py_hfd
Users that are interested in py_hfd are comparing it to the libraries listed below
Sorting:
- A paper list for spatial reasoning☆127Updated last month
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 3 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆268Updated last week
- A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…☆109Updated last month
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆65Updated 3 weeks ago
- A tiny paper rating web☆39Updated 4 months ago
- Official implementation of MC-LLaVA.☆130Updated 2 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆162Updated 3 months ago
- Accepted by CVPR 2024☆37Updated last year
- ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models☆53Updated 2 months ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆149Updated 4 months ago
- ☆132Updated 5 months ago
- ☆69Updated 2 weeks ago
- R1-like Video-LLM for Temporal Grounding☆109Updated last month
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆52Updated last month
- Long-RL: Scaling RL to Long Sequences☆568Updated this week
- ☆93Updated 4 months ago
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".☆135Updated 2 months ago
- Survey: https://arxiv.org/pdf/2507.20198☆46Updated last week
- Official repository for VisionZip (CVPR 2025)☆329Updated 2 weeks ago
- [CVPR’25] PIVRG & ConsMTL☆12Updated last month
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆64Updated 4 months ago
- ☆188Updated this week
- ☆62Updated last month
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".☆30Updated 3 weeks ago
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆58Updated 5 months ago
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆32Updated this week
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆68Updated 4 months ago
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆352Updated 5 months ago
- A Collection of Papers on Diffusion Language Models☆97Updated last month