arnav-gudibande / koala-test-set
The test set for Koala
☆45Updated last year
Alternatives and similar repositories for koala-test-set:
Users that are interested in koala-test-set are comparing it to the libraries listed below
- The data processing pipeline for the Koala chatbot language model☆117Updated last year
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆63Updated last year
- [AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following☆79Updated 4 months ago
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆68Updated last year
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- [NAACL 2024] Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models☆82Updated 10 months ago
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆113Updated 4 months ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated 9 months ago
- Source code and data for The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code (Findings of ACL 2023…☆29Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆42Updated last year
- CodeUltraFeedback: aligning large language models to coding preferences☆66Updated 6 months ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆79Updated 11 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- ☆75Updated last year
- SILO Language Models code repository☆81Updated 10 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated 11 months ago
- ☆44Updated 7 months ago
- ☆67Updated 5 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆90Updated last year
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆42Updated 2 months ago
- [ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks☆52Updated last year
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆45Updated last year
- Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.☆63Updated last month
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆51Updated 9 months ago
- [EMNLP 2023, Findings] GRACE: Discriminator-Guided Chain-of-Thought Reasoning☆46Updated 3 months ago
- Based on the tree of thoughts paper☆46Updated last year
- ☆26Updated 2 years ago
- ☆177Updated last year
- ☆115Updated 3 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated 11 months ago