dair-iitd / jeebench
JEEBench, EMNLP 2023
☆36Updated last year
Alternatives and similar repositories for jeebench:
Users that are interested in jeebench are comparing it to the libraries listed below
- ☆17Updated 5 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆71Updated 7 months ago
- Repository for the code and dataset for the paper: "Have LLMs Advanced enough? Towards Harder Problem Solving Benchmarks For Large Langu…☆39Updated last year
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆36Updated 2 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆83Updated 7 months ago
- Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.☆26Updated 2 weeks ago
- OpenPI dataset for tracking entities in open domain procedural text☆22Updated 7 months ago
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…☆59Updated last year
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆39Updated 4 months ago
- Resources for cultural NLP research☆86Updated 2 months ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆37Updated last week
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- ☆45Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆47Updated last year
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆41Updated 3 months ago
- Discovering Data-driven Hypotheses in the Wild☆65Updated 4 months ago
- Code for our ACL '23 paper titled "Grokking of Hierarchical Structure in Vanilla Transformers"☆21Updated last year
- ☆15Updated last month
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆33Updated last year
- ☆20Updated 2 years ago
- Tasks for describing differences between text distributions.☆16Updated 7 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Updated last year
- Code and Data for the NAACL 24 paper: MacGyver: Are Large Language Models Creative Problem Solvers?☆26Updated 11 months ago
- Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDL☆48Updated 5 months ago
- ☆155Updated 4 months ago
- ☆119Updated 5 months ago
- Inspecting and Editing Knowledge Representations in Language Models☆112Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆42Updated last year
- ☆25Updated 7 months ago
- The data and the PyTorch implementation for the models and experiments in the paper "Language Model Decoding as Likelihood–Utility Alignm…☆14Updated last year