lil-lab / cb2Links
An NLP research and data collection platform.
☆17Updated last year
Alternatives and similar repositories for cb2
Users that are interested in cb2 are comparing it to the libraries listed below
Sorting:
- Text Adventure Learning Environment Suite - Benchmark to evaluate language models on interactive text environments.☆18Updated last month
- Code for Benchmarking Language Model Agents for Data-Driven Science☆28Updated 8 months ago
- Code/data for MARG (multi-agent review generation)☆44Updated 8 months ago
- Tasks for describing differences between text distributions.☆16Updated 11 months ago
- LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task☆44Updated 6 months ago
- [ICML 2024] Language Models Represent Beliefs of Self and Others☆33Updated 9 months ago
- [NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.☆22Updated last year
- The source code for running LLMs on the AAAR-1.0 benchmark.☆16Updated 3 months ago
- Byte-sized text games for code generation tasks on virtual environments☆19Updated last year
- Super fast implementations of common benchmark text world games☆49Updated 4 months ago
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆109Updated 8 months ago
- ☆12Updated 3 months ago
- a benchmark to evaluate the situated inductive reasoning☆16Updated 6 months ago
- Codebase for LLM story generation; updated version of https//github.com/yangkevin2/doc-story-generation☆83Updated last year
- Evaluating the Moral Beliefs Encoded in LLMs☆26Updated 7 months ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆41Updated 5 months ago
- This is a repository for paper titled, PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Plann…☆13Updated last year
- this is for fun, ain't it grand!☆20Updated 2 months ago
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆15Updated 7 months ago
- ZYN: Zero-Shot Reward Models with Yes-No Questions