xcltql666 / DenseDiTLinks
Code for "From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios"
☆26Updated 3 weeks ago
Alternatives and similar repositories for DenseDiT
Users that are interested in DenseDiT are comparing it to the libraries listed below
Sorting:
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆67Updated 2 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆36Updated 6 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆72Updated 8 months ago
- Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆103Updated last week
- The open-source code of MetaStone-S1.☆83Updated 3 weeks ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆102Updated last week
- ☆84Updated last week
- Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆91Updated last month
- ARM: Adaptive Reasoning Model☆45Updated last week
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆46Updated 5 months ago
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆57Updated 4 months ago
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆40Updated 3 weeks ago
- ZeroGUI: Automating Online GUI Learning at Zero Human Cost☆80Updated 2 weeks ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆99Updated 2 months ago
- ☆32Updated 3 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆58Updated 9 months ago
- ☆50Updated last month
- A multimodal agent that can interact with its own PC in a multimodal manner.☆30Updated last week
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Updated 10 months ago
- LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models☆19Updated 4 months ago
- The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆160Updated 3 weeks ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆131Updated last month
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.☆75Updated this week
- ☆82Updated 2 weeks ago
- Efficient Agent Training for Computer Use☆120Updated last month
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆105Updated 2 months ago
- ☆87Updated last month
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆151Updated 3 weeks ago
- ☆77Updated 4 months ago
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆53Updated 9 months ago