open-compass/GPassK

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/open-compass/GPassK)

open-compass / GPassK

[ACL 2025] Are Your LLMs Capable of Stable Reasoning?

☆33

Alternatives and similar repositories for GPassK

Users that are interested in GPassK are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

open-compass / RePro
View on GitHub
[ICLR 2026] Rectifying LLM Thought From Lens of Optimization
☆15Dec 5, 2025Updated 7 months ago
open-compass / ProSA
View on GitHub
[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
☆29May 22, 2025Updated last year
zhujinliang / chinesetokenization
View on GitHub
chinesetokenization
☆13Jun 4, 2013Updated 13 years ago
open-compass / MathBench
View on GitHub
[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset
☆115May 22, 2025Updated last year
GraphPKU / number_cookbook
View on GitHub
Official repository for the paper Number Cookbook: Number Understanding of Language Models and How to Improve It.
☆21Mar 31, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
open-compass / CIBench
View on GitHub
Official Repo of "CIBench: Evaluation of LLMs as Code Interpreter "
☆15Jul 19, 2024Updated 2 years ago
linkedin / ControlLLM
View on GitHub
Control LLM
☆23Apr 6, 2025Updated last year
QizhiPei / MathFusion
View on GitHub
MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)
☆37Jul 16, 2025Updated last year
declare-lab / VIP
View on GitHub
Our EMNLP 2022 paper on VIP-Based Prompting for Parameter-Efficient Learning
☆10Oct 22, 2022Updated 3 years ago
jtonglet / Numerical-Hybrid-QA-Literature
View on GitHub
A list of Numerical Multimodal reasoning papers and their implementation
☆11May 13, 2024Updated 2 years ago
NJUNLP / Hallu-PI
View on GitHub
The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …
☆11Sep 27, 2024Updated last year
qcznlp / uncertainty_attack
View on GitHub
☆23Sep 2, 2025Updated 10 months ago
KbsdJames / Omni-MATH
View on GitHub
The official repository of the Omni-MATH benchmark.
☆94Dec 22, 2024Updated last year
jiaconghu / Model-LEGO
View on GitHub
Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks
☆17Jan 15, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
QwenLM / ProcessBench
View on GitHub
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆189May 20, 2025Updated last year
Zhenwen-NLP / MathChat
View on GitHub
Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Inte…
☆22Jun 3, 2024Updated 2 years ago
X-LANCE / text2sql-multiturn-GPT
View on GitHub
[NAACL 2024] CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions
☆13May 7, 2024Updated 2 years ago
Rainier-rq / verl-if
View on GitHub
Official implementation of the paper "Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following"
☆40Jan 11, 2026Updated 6 months ago
rohinmanvi / Capability-Aware-and-Mid-Generation-Self-Evaluations
View on GitHub
☆21Jul 25, 2025Updated 11 months ago
THU-KEG / PairJudgeRM
View on GitHub
☆15Apr 14, 2025Updated last year
OpenEvaluation / VLMEvalKit
View on GitHub
☆23Apr 11, 2026Updated 3 months ago
GATECH-EIC / Linearized-LLM
View on GitHub
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆35Jun 12, 2024Updated 2 years ago
hccngu / DialCoT
View on GitHub
DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models
☆13Nov 2, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
MingLiiii / Layer_Gradient
View on GitHub
[ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
☆75Jun 25, 2025Updated last year
domonic18 / ai-eval-system
View on GitHub
这是一个基于OpenCompass的模型评测系统，该系统提供了前端页面UI以方便用户自助开展评测工作。
☆28Aug 25, 2025Updated 10 months ago
vickywu1022 / OntoProbe-PLMs
View on GitHub
Repo for outstanding paper@ACL 2023 "Do PLMs Know and Understand Ontological Knowledge?"
☆33Oct 16, 2023Updated 2 years ago
Zcchill / Value-Residual-Learning
View on GitHub
☆15Mar 20, 2025Updated last year
Linzwcs / AFT
View on GitHub
☆13Jan 22, 2025Updated last year
dsam99 / QueRE
View on GitHub
Code repository for the paper on "Predicting the Performance of Black-Box LLMs through Self-Queries".
☆12Jan 9, 2025Updated last year
LAMDA-NeSy / Self-Backtracking
View on GitHub
☆52Feb 12, 2025Updated last year
byronBBL / Context-DPO
View on GitHub
Official repository of paper "Context-DPO: Aligning Language Models for Context-Faithfulness"
☆23Feb 17, 2025Updated last year
WooooDyy / MathCritique
View on GitHub
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
☆55Nov 29, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
kilian-group / LMLM
View on GitHub
☆35Oct 31, 2025Updated 8 months ago
rempsyc / starter-academic
View on GitHub
My personal site, using Wowchemy
☆13Updated this week
thad0ctor / KrunchWrapper
View on GitHub
☆18Jul 1, 2025Updated last year
gmongaras / Cottention_Transformer
View on GitHub
Code for the paper "Cottention: Linear Transformers With Cosine Attention"
☆20Nov 15, 2025Updated 8 months ago
xiusic / MinPrompt
View on GitHub
MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering
☆14May 3, 2024Updated 2 years ago
Sueqk / LMM-VQA
View on GitHub
LMM for VQA, tcsvt version
☆10Jul 19, 2024Updated 2 years ago
mathllm / MathCoder2
View on GitHub
☆71Oct 16, 2024Updated last year