kagnlp / XolverLinks
☆29Updated last month
Alternatives and similar repositories for Xolver
Users that are interested in Xolver are comparing it to the libraries listed below
Sorting:
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]☆77Updated 4 months ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆46Updated 8 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆116Updated 3 months ago
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆31Updated 2 months ago
- Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆131Updated 3 months ago
- Resa: Transparent Reasoning Models via SAEs☆44Updated last month
- ☆72Updated 3 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆50Updated 3 months ago
- The official repository of "R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Integration"☆119Updated last month
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆27Updated last year
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆73Updated 11 months ago
- ☆30Updated last month
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆45Updated 4 months ago
- The official repo for “Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem” [EMNLP25]☆32Updated 2 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆85Updated 2 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆52Updated 10 months ago
- ☆23Updated 4 months ago
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆16Updated 2 weeks ago
- ☆60Updated last month
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆18Updated last year
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Updated 5 months ago
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆104Updated 2 months ago
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆48Updated last week
- Official Repo for RuleReasoner.☆28Updated 4 months ago
- PyTorch implementation of "Sample- and Parameter-Efficient Auto-Regressive Image Models" from CVPR 2025☆13Updated 7 months ago
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆19Updated 10 months ago
- Multimodal RewardBench☆54Updated 8 months ago
- Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".☆82Updated this week
- ☆61Updated 3 months ago
- ☆49Updated 5 months ago