kagnlp / XolverLinks
☆32Updated 4 months ago
Alternatives and similar repositories for Xolver
Users that are interested in Xolver are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆73Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]☆77Updated 7 months ago
- ☆42Updated 4 months ago
- ☆51Updated 8 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆48Updated 7 months ago
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆36Updated 2 months ago
- Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.☆101Updated 3 weeks ago
- Resa: Transparent Reasoning Models via SAEs☆47Updated 4 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 6 months ago
- 🚀 LLM-I: Transform LLMs into natural interleaved multimodal creators! ✨ Tool-use framework supporting image search, generation, code ex…☆41Updated 3 months ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆48Updated 11 months ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"☆53Updated 7 months ago
- Spatial Aptitude Training for Multimodal Langauge Models☆24Updated 2 weeks ago
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆19Updated last year
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆17Updated 3 months ago
- ☆63Updated last month
- A paper list of world model☆29Updated 9 months ago
- ☆119Updated 3 weeks ago
- ☆24Updated 5 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆53Updated last year
- This repository is a collection of research papers on World Models.☆43Updated 2 years ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆129Updated 6 months ago
- CoV: Chain-of-View Prompting for Spatial Reasoning☆50Updated 2 weeks ago
- Scaffold Prompting to promote LMMs☆46Updated last year
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆91Updated 6 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Updated 10 months ago
- Multimodal RewardBench☆61Updated 11 months ago
- PyTorch implementation of "Sample- and Parameter-Efficient Auto-Regressive Image Models" from CVPR 2025☆14Updated 2 months ago
- ☆24Updated 7 months ago