WHGTyen / BIG-Bench-MistakeView external linksLinks
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆85Aug 10, 2024Updated last year
Alternatives and similar repositories for BIG-Bench-Mistake
Users that are interested in BIG-Bench-Mistake are comparing it to the libraries listed below
Sorting:
- Open-source repository for the OOPSLA'24 paper "CYCLE: Learning to Self-Refine Code Generation"☆10Mar 8, 2024Updated last year
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Oct 3, 2024Updated last year
- A dashboard for exploring timm learning rate schedulers☆19Nov 22, 2024Updated last year
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- ☆132May 8, 2025Updated 9 months ago
- Official implementation of ECCV24 paper: POA☆24Aug 8, 2024Updated last year
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆165Oct 11, 2024Updated last year
- 🤖ConvRe🤯: An Investigation of LLMs’ Inefficacy in Understanding Converse Relations (EMNLP 2023)☆24Oct 10, 2023Updated 2 years ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆77Oct 9, 2025Updated 4 months ago
- Shaping Language Models with Cognitive Insights☆15Feb 29, 2024Updated last year
- Evolutionary Search for expert-level performance on any task with environmental feedback☆14Oct 12, 2025Updated 4 months ago
- EMNLP 2022: Analyzing and Evaluating Faithfulness in Dialogue Summarization☆13Mar 20, 2025Updated 10 months ago
- [COLM 2025: 1st Workshop on the Application of LLM Explainability to Reasoning and Planning] Latent Chain-of-Thought? Decoding the Depth-…☆17Oct 4, 2025Updated 4 months ago
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- Multilingual Entity Linking model by BELA model☆12Jul 20, 2023Updated 2 years ago
- FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback☆12Jul 13, 2022Updated 3 years ago
- ☆11Jan 3, 2024Updated 2 years ago
- Seamless Voice Interactions with LLMs☆12Oct 28, 2023Updated 2 years ago
- ☆33Feb 2, 2026Updated last week
- ☆16Aug 1, 2024Updated last year
- [LREC-Coling 2024] PECC: Problem Extraction and Coding Challenges☆14May 30, 2024Updated last year
- Computationally Modelling Resisting Strategies in Persuasive Conversations☆12Feb 6, 2022Updated 4 years ago
- ☆13Jan 22, 2025Updated last year
- Sythetic data generation and normalization functions powered by LLMs☆58Sep 19, 2024Updated last year
- ☆25Jun 11, 2025Updated 8 months ago
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆15Oct 16, 2023Updated 2 years ago
- code for Scaling Laws of RoPE-based Extrapolation☆73Oct 16, 2023Updated 2 years ago
- A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.☆62Oct 21, 2024Updated last year
- Repository having the code and models from the paper: data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student traini…☆13Mar 18, 2024Updated last year
- A OpenAI GPT3 based QnA agent for documents and links☆12Jul 11, 2023Updated 2 years ago
- ACL 2022: Just Rank: Rethinking Evaluation with Word and Sentence Similarities☆35Dec 14, 2022Updated 3 years ago
- [NAACL 2024] A Synthetic, Scalable and Systematic Evaluation Suite for Large Language Models☆33Jun 10, 2024Updated last year
- ☆130Jul 8, 2024Updated last year
- ☆54Aug 25, 2023Updated 2 years ago
- Prompt Development Environment for GPT☆14Jul 23, 2023Updated 2 years ago
- ☆16Nov 26, 2024Updated last year
- Echo Noise Channel for Exact Mutual Information Calculation☆17Jul 17, 2020Updated 5 years ago
- Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?☆57Apr 17, 2023Updated 2 years ago
- ☆16Nov 15, 2024Updated last year