MetaCopilot / dseval
☆15Updated 2 months ago
Related projects: ⓘ
- ☆18Updated 3 months ago
- This repository contains some of the code used in the paper "Training Language Models with Langauge Feedback at Scale"☆26Updated last year
- [NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.☆23Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆33Updated 6 months ago
- Critique-out-Loud Reward Models☆17Updated 2 weeks ago
- NLPBench: Evaluating NLP-Related Problem-solving Ability in Large Language Models☆9Updated 10 months ago
- Repository for Skill Set Optimization☆12Updated last month
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆49Updated 2 weeks ago
- ☆12Updated 6 months ago
- [ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning☆21Updated last year
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆14Updated 6 months ago
- Source code for the paper "Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data"☆20Updated 6 months ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆28Updated last month
- Code for our paper: "GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models"☆50Updated last year
- ☆23Updated 3 weeks ago
- Benchmarking LLMs' Gaming Ability in Multi-Agent Environments☆33Updated this week
- LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task☆33Updated last week
- ☆11Updated 2 weeks ago
- Byte-sized text games for code generation tasks on virtual environments☆17Updated 2 months ago
- m&ms: A Benchmark to Evaluate Tool-Use for multi-step multi-modal tasks☆30Updated 5 months ago
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator support…☆34Updated last year
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆22Updated last month
- The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agen…☆20Updated 6 months ago
- Adding new tasks to T0 without catastrophic forgetting☆30Updated last year
- Code for paper 'Data-Efficient FineTuning'☆29Updated last year
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆27Updated this week
- CodeUltraFeedback: aligning large language models to coding preferences☆62Updated 2 months ago
- ☆14Updated last week
- Code and Data for the NAACL 24 paper: MacGyver: Are Large Language Models Creative Problem Solvers?☆21Updated 5 months ago
- Complexity Based Prompting for Multi-Step Reasoning☆14Updated last year