hhnqqq / py_hfdLinks
A python script for downloading huggingface datasets and models.
☆19Updated last month
Alternatives and similar repositories for py_hfd
Users that are interested in py_hfd are comparing it to the libraries listed below
Sorting:
- A tiny paper rating web☆37Updated 2 months ago
- A paper list for spatial reasoning☆73Updated last week
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models☆48Updated this week
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated last month
- ☆84Updated 2 months ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆105Updated 3 months ago
- TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆42Updated 2 weeks ago
- Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…☆126Updated this week
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆62Updated 2 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆218Updated this week
- [ICLR'25] Reconstructive Visual Instruction Tuning☆89Updated last month
- Official implement of MIA-DPO☆58Updated 4 months ago
- Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning☆14Updated this week
- 🎉 [ACL 2025] The code repository for "Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning…☆16Updated 3 weeks ago
- [CVPR’25] PIVRG & ConsMTL☆11Updated 2 weeks ago
- ☆119Updated 3 months ago
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".☆112Updated 2 weeks ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆101Updated last week
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆59Updated 2 months ago
- ☆39Updated last month
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆25Updated 2 weeks ago
- Official repository for VisionZip (CVPR 2025)☆285Updated last week
- [LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)☆27Updated 3 weeks ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆84Updated last year
- ☆100Updated last month
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆142Updated 2 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆96Updated 10 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆63Updated 10 months ago
- Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"☆46Updated last month
- [Blog 1] Recording a bug of grpo_trainer in some R1 projects☆19Updated 3 months ago