hhnqqq / py_hfd
A python script for downloading huggingface datasets and models.
☆19Updated last month
Alternatives and similar repositories for py_hfd:
Users that are interested in py_hfd are comparing it to the libraries listed below
- Collections of Papers and Projects for Multimodal Reasoning.☆104Updated 2 weeks ago
- ☆117Updated 2 months ago
- A paper list for spatial reasoning☆58Updated last month
- ☆83Updated last month
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆181Updated last week
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMs☆51Updated 2 months ago
- A tiny paper rating web☆36Updated last month
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆83Updated last month
- R1-like Video-LLM for Temporal Grounding☆85Updated last month
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆94Updated 2 months ago
- Accepted by CVPR 2024☆33Updated 11 months ago
- DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models☆43Updated last month
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆56Updated last month
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.☆73Updated 4 months ago
- [Blog 1] Recording a bug of grpo_trainer in some R1 projects☆20Updated 2 months ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆59Updated last month
- [ICLR'25] Reconstructive Visual Instruction Tuning☆83Updated last month
- [LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)☆27Updated this week
- ☆95Updated 3 weeks ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆84Updated 8 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆97Updated 9 months ago
- Official implement of MIA-DPO☆57Updated 3 months ago
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!☆125Updated last year
- ⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.☆132Updated this week
- 【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".☆33Updated 5 months ago
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆21Updated this week
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆72Updated this week
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆114Updated this week
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆127Updated 11 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆119Updated 3 months ago