hhnqqq / py_hfdLinks
A python script for downloading huggingface datasets and models.
β20Updated 5 months ago
Alternatives and similar repositories for py_hfd
Users that are interested in py_hfd are comparing it to the libraries listed below
Sorting:
- π This is a repository for organizing papers, codes, and other resources related to unified multimodal models.β288Updated last week
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videosβ76Updated this week
- Collections of Papers and Projects for Multimodal Reasoning.β106Updated 4 months ago
- A tiny paper rating webβ39Updated 5 months ago
- π₯CVPR 2025 Multimodal Large Language Models Paper Listβ153Updated 6 months ago
- A paper list for spatial reasoningβ138Updated 3 months ago
- Survey: https://arxiv.org/pdf/2507.20198β133Updated this week
- [ICCV 2025] FonTS: Text Rendering with Typography and Style Controlsβ26Updated 3 weeks ago
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".β52Updated 2 months ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGIβ155Updated last week
- R1-like Video-LLM for Temporal Groundingβ115Updated 2 months ago
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Modelsβ36Updated 2 weeks ago
- β83Updated last month
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.β68Updated 5 months ago
- β139Updated 7 months ago
- β114Updated 5 months ago
- ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Modelsβ59Updated 3 months ago
- β104Updated 2 months ago
- Official implementation of MC-LLaVA.β139Updated 3 weeks ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?β82Updated last month
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasingββ56Updated 2 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025β73Updated 6 months ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ122Updated 6 months ago
- TStar is a unified temporal search framework for long-form video question answeringβ67Updated 2 weeks ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought β¦β370Updated 8 months ago
- RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video.β20Updated 2 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understandingβ100Updated 3 weeks ago
- [ICLR2025] Official code implementation of Video-UTR: Unhackable Temporal Rewarding for Scalable Video MLLMsβ59Updated 6 months ago
- β¨First Open-Source R1-like Video-LLM [2025/02/18]β362Updated 6 months ago
- Official PyTorch Code of ReKV (ICLR'25)β48Updated 6 months ago