A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆87Aug 10, 2024Updated last year
Alternatives and similar repositories for BIG-Bench-Mistake
Users that are interested in BIG-Bench-Mistake are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Open-source repository for the OOPSLA'24 paper "CYCLE: Learning to Self-Refine Code Generation"☆10Mar 8, 2024Updated 2 years ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Oct 3, 2024Updated last year
- EMNLP 2022: Analyzing and Evaluating Faithfulness in Dialogue Summarization☆13Mar 20, 2025Updated last year
- FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback☆12Jul 13, 2022Updated 3 years ago
- A flexible & scalable MLLM-based AIGC detection pipeline☆35Oct 27, 2025Updated 6 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors☆27Mar 2, 2026Updated 2 months ago
- DSTC9 Submission☆16Apr 12, 2021Updated 5 years ago
- ☆17Aug 1, 2024Updated last year
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- The Lean Theorem Proving Environment☆15May 7, 2023Updated 3 years ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆169Oct 11, 2024Updated last year
- code for Scaling Laws of RoPE-based Extrapolation☆73Oct 16, 2023Updated 2 years ago
- ☆27Jun 11, 2025Updated 10 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆80Oct 9, 2025Updated 7 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆27Oct 3, 2025Updated 7 months ago
- ACL24☆11Jun 7, 2024Updated last year
- A dashboard for exploring timm learning rate schedulers☆20Nov 22, 2024Updated last year
- ☆138May 8, 2025Updated last year
- LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.☆798Oct 4, 2024Updated last year
- 🤖ConvRe🤯: An Investigation of LLMs’ Inefficacy in Understanding Converse Relations (EMNLP 2023)☆24Oct 10, 2023Updated 2 years ago
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆63Apr 18, 2024Updated 2 years ago
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- ACL 2022: Just Rank: Rethinking Evaluation with Word and Sentence Similarities☆35Dec 14, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.☆64Oct 21, 2024Updated last year
- ☆133Jul 8, 2024Updated last year
- Computationally Modelling Resisting Strategies in Persuasive Conversations☆12Feb 6, 2022Updated 4 years ago
- ☆27Nov 25, 2025Updated 5 months ago
- Code for paper: Long cOntext aliGnment via efficient preference Optimization☆24Oct 10, 2025Updated 7 months ago
- Seamless Voice Interactions with LLMs☆12Oct 28, 2023Updated 2 years ago
- Improving word mover’s distance by leveraging self-attention matrix (Published in EMNLP 2023 Findings)☆10Mar 10, 2026Updated 2 months ago
- [ICLR 2025] Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist☆34Oct 23, 2024Updated last year
- Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?☆57Apr 17, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- superquadrics based grasping☆13Dec 4, 2018Updated 7 years ago
- Proof recording for Lean 3☆27Sep 30, 2021Updated 4 years ago
- ☆34Mar 21, 2026Updated last month
- Redundancy Undermines the Trustworthiness of Self-Interpretable GNNs, International Conference on Machine Learning (ICML), 2025☆15Jun 23, 2025Updated 10 months ago
- Scratchpad/Chain-of-Thought Prompts☆12Jun 6, 2022Updated 3 years ago
- ☆13Sep 27, 2022Updated 3 years ago
- Dataflow-guided retrieval augmentation for repository-level code completion, ACL 2024 (main)☆33Mar 24, 2025Updated last year