GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts
☆39Sep 30, 2025Updated 5 months ago
Alternatives and similar repositories for GSM8K-V
Users that are interested in GSM8K-V are comparing it to the libraries listed below
Sorting:
- Benchmarking agent reasoning capabilities in physical interactions, tool usage, and multi-agent coordination.☆42Aug 10, 2025Updated 6 months ago
- [NeurIPS 2025] Mind the Gap: Bridging Thought Leap for Improved CoT Tuning https://arxiv.org/abs/2505.14684☆45Oct 20, 2025Updated 4 months ago
- This repository is the official implementation of TimeHC-RL (Distilabel (Data Generation) + TRL (SFT) + VeRL (GRPO)).☆48Jun 4, 2025Updated 9 months ago
- [ICLR 2026] InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models☆47Feb 12, 2026Updated 3 weeks ago
- ☆25Aug 19, 2025Updated 6 months ago
- ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models☆70Jan 8, 2026Updated 2 months ago
- [AAAI 2026] Test-Time Reinforcement Learning for GUI Grounding via Region Consistency https://arxiv.org/abs/2508.05615☆61Nov 8, 2025Updated 4 months ago
- A Unified Framework for High-Performance and Extensible LLM Steering☆179Updated this week
- Pytorch、Numpy实现NMS、Soft-NMS代码☆12Mar 22, 2021Updated 4 years ago
- Accepted at IJCAI-2022☆11Sep 3, 2022Updated 3 years ago
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆14Sep 1, 2022Updated 3 years ago
- Un-*** 50 billions multimodality dataset☆23Sep 14, 2022Updated 3 years ago
- Easy and Efficient dLLM Fine-Tuning☆229Mar 2, 2026Updated last week
- A curated collection of resources, tools, and frameworks for developing GUI Agents.☆311Mar 2, 2026Updated last week
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆28Sep 1, 2022Updated 3 years ago
- ☆28Jan 9, 2025Updated last year
- Tutorial on using Hugging Face's Vision Transformers for Image Classification☆10Sep 4, 2021Updated 4 years ago
- My daily arxiv reading note☆30Nov 10, 2021Updated 4 years ago
- Code for "A Trigger-Sense Memory Flow Framework for Joint Entity and Relation Extraction". accepted at WWW 2021.☆28Jun 5, 2021Updated 4 years ago
- ☆10Sep 7, 2019Updated 6 years ago
- Tutorial for Graph Neural Network at APBJC 2024.☆10Apr 21, 2025Updated 10 months ago
- Unofficial implement of "Pix2seq: A Language Modeling Framework for Object Detection" on mmdetection☆33Apr 18, 2022Updated 3 years ago
- Repo for our work "Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence"☆19Jun 2, 2025Updated 9 months ago
- An Advanced Basic Math Reasoning and Overthinking Evaluation Framework for LLMs☆12Jul 8, 2025Updated 8 months ago
- How to use OpenAI API?☆12Nov 23, 2023Updated 2 years ago
- ☆15Sep 26, 2020Updated 5 years ago
- 一个支持跨模态大语言模型的webui. A chatbot webui that supports various multi-modal large language models☆11May 8, 2023Updated 2 years ago
- ☆12Oct 30, 2022Updated 3 years ago
- code for progressive gsl☆12Jan 15, 2026Updated last month
- NLP with Transformers Study Group Materials & Resources☆11Jun 26, 2023Updated 2 years ago
- 逆向抖音获取直播间实时弹幕☆10Apr 29, 2023Updated 2 years ago
- ☆26Jul 9, 2025Updated 8 months ago
- [AAAI 2026] GUI-G²: Gaussian Reward Modeling for GUI Grounding☆305Feb 2, 2026Updated last month
- [EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks☆10Nov 27, 2024Updated last year
- ☆20Oct 4, 2024Updated last year
- ☆12May 27, 2024Updated last year
- YoloTeeth is a GitHub repository dedicated to leveraging YOLOv8 for precise instance segmentation and object detection in teeth X-ray ima…☆13Nov 10, 2024Updated last year
- Official implementation of "Weakly-supervised positional contrastive learning: application to cirrhosis classification", MICCAI 2023☆11Dec 16, 2025Updated 2 months ago
- Repository for SPECTRA: Sparse Structured Text Rationalization, accepted at EMNLP 2021 main conference.☆10Feb 14, 2024Updated 2 years ago