tongjingqi/MathTrap

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tongjingqi/MathTrap)

tongjingqi / MathTrap

In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset MATHTRAP‡ by introducing carefully designed logical traps into the problem descriptions of MATH and GSM8K.

☆60

Alternatives and similar repositories for MathTrap

Users that are interested in MathTrap are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tongjingqi / Game-RL
View on GitHub
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
☆157Jul 18, 2026Updated last week
llmeval / LLMEval-Fair
View on GitHub
[ACL 2026] A large-scale longitudinal study on robust and fair evaluation of LLMs — 200K+ generative questions across 13 disciplines
☆40May 21, 2026Updated 2 months ago
tongjingqi / AI-Can-Learn-Scientific-Taste
View on GitHub
We propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervis…
☆425Updated this week
maminian / uncmathbeamer
View on GitHub
A minimal beamer theme for UNC Chapel Hill
☆11Sep 4, 2018Updated 7 years ago
RichardGanaye / Cox-Galois-Theory-Exercises
View on GitHub
Exercises Galois theory D. Cox
☆13Jun 29, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
WooooDyy / BAPO
View on GitHub
Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping…
☆94Jan 29, 2026Updated 5 months ago
ChengpengLi1003 / DotaMath
View on GitHub
☆30Dec 27, 2024Updated last year
IlyaGusev / TaleStudio
View on GitHub
Fork of RecurrentGPT with modifications
☆10Sep 18, 2024Updated last year
january-blue / OpenNovelty
View on GitHub
☆135May 12, 2026Updated 2 months ago
Bios-Marcel / memoryalike
View on GitHub
A memory alike game for your terminal
☆14Oct 8, 2020Updated 5 years ago
cs-holder / Reasoning-Self-Evolution-Survey
View on GitHub
☆54Mar 6, 2025Updated last year
OpenBMB / OlympiadBench
View on GitHub
[ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scie…
☆195Jun 8, 2025Updated last year
whoward69 / DLL-VMC
View on GitHub
DLL - Various Mod Components
☆21Aug 17, 2024Updated last year
THUKElab / LatEval
View on GitHub
☆10Mar 19, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
yunfeixie233 / ViGaL
View on GitHub
☆70Feb 4, 2026Updated 5 months ago
hsajjad / ConceptX
View on GitHub
Analyzing Latent Concept in Pre-trained Transformer Models
☆12Jul 18, 2022Updated 4 years ago
hewei2001 / ReachQA
View on GitHub
[EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs
☆61Aug 25, 2025Updated 11 months ago
zc277584121 / akcio
View on GitHub
Akcio is a demonstration project for Retrieval Augmented Generation (RAG). It leverages the power of LLM to generate responses and uses v…
☆12Oct 30, 2023Updated 2 years ago
mengzaiqiao / awesome-natural-language-reasoning
View on GitHub
A collection of research papers related to Natural Language Reasoning
☆10May 27, 2022Updated 4 years ago
bennettjustin / AirFlag
View on GitHub
Detect nearby AirTags in disconnected or lost modes.
☆10Feb 3, 2022Updated 4 years ago
AppliedMathematicsANU / diamorse
View on GitHub
Digital image analysis using discrete Morse theory and persistent homology
☆31Nov 9, 2025Updated 8 months ago
yanncam / multiduplicut
View on GitHub
multiduplicut : optimize wordlists-based password cracking methods chaining
☆17Feb 25, 2022Updated 4 years ago
hkust-nlp / mstar
View on GitHub
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆75Jul 13, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
pyroscope / pyrobase
View on GitHub
General Python Helpers and Utilities
☆16Nov 15, 2021Updated 4 years ago
sunfanyunn / FactorSim
View on GitHub
Official Code for the NeurIPS 2024 paper "FactorSim: Generative Simulation via Factorized Representation"
☆14Sep 26, 2024Updated last year
weizhepei / ReadingList
View on GitHub
A list of research resources that I've appreciated.
☆12Dec 10, 2019Updated 6 years ago
haowei-freesky / HERMES
View on GitHub
Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding" [ACL 2026]
☆92May 8, 2026Updated 2 months ago
feng-yufei / Neural-Natural-Logic
View on GitHub
Implementation of the first neural natural logic paper on natural language inference
☆10Oct 31, 2022Updated 3 years ago
ssmisya / PRMBench
View on GitHub
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
☆94Feb 15, 2025Updated last year
ethanabrooks / computational-graph
View on GitHub
Efficiently performs automatic differentiation on arbitrary functions. Basically a rudimentary version of Tensorflow.
☆12Feb 18, 2017Updated 9 years ago
mbelloiseau / network-latency
View on GitHub
Track network latency with Telegraf, InfluxDB and Grafana
☆17Aug 9, 2023Updated 2 years ago
allenai / DrawEduMath
View on GitHub
Can VLMs understand students' hand-drawn math work?
☆19Jan 20, 2026Updated 6 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
hanqi-qi / LLM_MetaReasoning
View on GitHub
☆15Jul 29, 2025Updated 11 months ago
jonathandale / chat-ollama
View on GitHub
UI for Ollama
☆14Aug 21, 2025Updated 11 months ago
EmbodiedForge / Inspire-cli
View on GitHub
A tool for better use of Inspire platform (Beta: Codeberg version is more up-to-date)
☆28Apr 2, 2026Updated 3 months ago
ahoendgen / airtag-locator
View on GitHub
Parse, store & visualize Apple's "Find My" App data with an easy to use web interface for device location
☆20May 31, 2023Updated 3 years ago
shirlyliu64 / ConvBench
View on GitHub
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models
☆16Sep 27, 2024Updated last year
SEMERU-WM / ChangeScribe
View on GitHub
ChangeScribe is an Eclipse plugin for generating commit messages (a.k.a., commit logs, commit notes) automatically. ChangeScribe uses as …
☆10Jan 26, 2023Updated 3 years ago
mathesong / kinfitr
View on GitHub
kinfitr: PET Kinetic Modelling Using R
☆37Jul 2, 2026Updated 3 weeks ago