xcltql666 / DenseDiTLinks
Code for "From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios"
☆26Updated last week
Alternatives and similar repositories for DenseDiT
Users that are interested in DenseDiT are comparing it to the libraries listed below
Sorting:
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆63Updated last month
- Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆85Updated 3 weeks ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆36Updated 5 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆93Updated this week
- ZeroGUI: Automating Online GUI Learning at Zero Human Cost☆69Updated last week
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.☆71Updated this week
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆46Updated 4 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆71Updated 7 months ago
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆40Updated 3 weeks ago
- The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆150Updated last week
- Official repository for "Web-Shepherd: Advancing PRMs for Reinforcing Web Agents"☆36Updated last month
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding☆35Updated this week
- ARM: Adaptive Reasoning Model☆44Updated 3 weeks ago
- ☆50Updated last month
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆55Updated 4 months ago
- ☆48Updated last month
- LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models☆19Updated 3 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆69Updated last week
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆88Updated last month
- A repo for open research on building large reasoning models☆68Updated this week
- Repo for "Z1: Efficient Test-time Scaling with Code"☆63Updated 3 months ago
- [ACL 2025] An inference-time decoding strategy with adaptive foresight sampling☆99Updated last month
- ☆30Updated 2 months ago
- [IEEE VIS 2024] LLaVA-Chart: Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruc…☆68Updated 5 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆124Updated last month
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆19Updated 3 weeks ago
- Large Language Models Can Self-Improve in Long-context Reasoning☆71Updated 7 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆44Updated last year
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆25Updated 2 months ago
- Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models☆36Updated 9 months ago