[NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"
☆28May 28, 2024Updated last year
Alternatives and similar repositories for OOD-Math-Reasoning
Users that are interested in OOD-Math-Reasoning are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025 D&B Track] Evaluation Code Repo for Paper "PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts"☆40May 22, 2025Updated 9 months ago
- [ACL 2023] Code and Data Repo for Paper "Element-aware Summary and Summary Chain-of-Thought (SumCoT)"☆53Jan 21, 2024Updated 2 years ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆30Mar 5, 2024Updated last year
- Code of "Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model"☆23Jun 28, 2024Updated last year
- ☆13May 21, 2024Updated last year
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆23Mar 4, 2025Updated 11 months ago
- {DeepL, Google, WMT-Best, davinci-003, turbo, gpt-4} × {En-De, En-Cs, En-Ru, En-Zh, De-Fr, En-Ja, Uk-En, Uk-Cs, En-Hr, En-Ha, En-Is}☆14Jun 18, 2023Updated 2 years ago
- This the implementation of LeCo☆31Jan 20, 2025Updated last year
- A Data Source for Reasoning Embodied Agents☆19Sep 18, 2023Updated 2 years ago
- Official repository for Decentralized Arena via Collective LLM Intelligence☆17May 19, 2025Updated 9 months ago
- ☆27Aug 27, 2025Updated 6 months ago
- ☆20Nov 3, 2024Updated last year
- EMNLP'23 survey: a curation of awesome papers and resources on refreshing large language models (LLMs) without expensive retraining.☆136Dec 12, 2023Updated 2 years ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- [ICML2024]Adaptive decoding balances the diversity and coherence of open-ended text generation.☆19Jun 2, 2024Updated last year
- The official implementation of the paper "Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation"☆20Dec 10, 2024Updated last year
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆99Jan 11, 2026Updated last month
- ☆30Jun 19, 2023Updated 2 years ago
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆41Oct 31, 2025Updated 3 months ago
- ☆23Oct 11, 2024Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATH☆26Dec 23, 2024Updated last year
- ☆23Nov 15, 2022Updated 3 years ago
- ☆30May 22, 2024Updated last year
- source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"☆66Apr 11, 2025Updated 10 months ago
- [AAAI 2024] SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research☆30Aug 6, 2024Updated last year
- ☆37Oct 15, 2024Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆184May 20, 2025Updated 9 months ago
- NeurIPS 2024 "NoisyGL: A Comprehensive Benchmark for Graph Neural Networks under Label Noise"☆31Jun 4, 2025Updated 8 months ago
- ☆70Jun 18, 2025Updated 8 months ago
- The official repo for the paper "Teacher Forcing Recovers Reward Functions for Text Generation"☆31May 27, 2023Updated 2 years ago
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆67Mar 27, 2025Updated 11 months ago
- ☆26Apr 27, 2025Updated 10 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆71May 22, 2025Updated 9 months ago
- ☆10Dec 21, 2022Updated 3 years ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Sep 26, 2024Updated last year
- Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.☆37Nov 13, 2024Updated last year
- [NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"☆39Jul 18, 2025Updated 7 months ago
- ☆88Jun 7, 2024Updated last year
- ☆12Jan 11, 2026Updated last month