lil-lab / cb2Links
An NLP research and data collection platform.
☆17Updated last year
Alternatives and similar repositories for cb2
Users that are interested in cb2 are comparing it to the libraries listed below
Sorting:
- Codebase for LLM story generation; updated version of https//github.com/yangkevin2/doc-story-generation☆85Updated last year
- Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)☆245Updated last week
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆110Updated 10 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆77Updated last year
- Machine Theory of Mind Reading List. Built upon EMNLP Findings 2023 Paper: Towards A Holistic Landscape of Situated Theory of Mind in Lar…☆141Updated 7 months ago
- ☆98Updated last year
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated 2 years ago
- ☆14Updated 5 months ago
- A collection of works that investigate social agents, simulations and their real-world impact in text, embodied, and robotics contexts.☆96Updated last year
- ☆44Updated last year
- Super fast implementations of common benchmark text world games☆51Updated last month
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆47Updated last year
- Official code repository for the paper "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"☆19Updated 3 months ago
- Code for the paper "LASER: LLM Agent with State-Space Exploration for Web Navigation"☆33Updated 2 years ago
- SummScreen: A Dataset for Abstractive Screenplay Summarization (ACL 2022)☆37Updated 3 years ago
- distilled Self-Critique refines the outputs of a LLM with only synthetic data☆11Updated last year
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆87Updated last year
- ☆210Updated 2 years ago
- PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion☆57Updated last year
- Byte-sized text games for code generation tasks on virtual environments☆20Updated last year
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆140Updated last year
- Code/data for MARG (multi-agent review generation)☆49Updated 10 months ago
- This repo contains code for our NeurIPS 2023 spotlight paper: Evaluating and Inducing Personality in Pre-trained Language Models☆55Updated last year
- ☆14Updated 5 months ago
- This is the repository for TimelineQA, a benchmark for querying lifelogs.☆25Updated 2 years ago
- Code for our NeurIPS'24 Dataset and Benchmark paper: Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiatio…☆37Updated 10 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆92Updated last year
- An Implementation of "Orca: Progressive Learning from Complex Explanation Traces of GPT-4"☆43Updated 11 months ago
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback☆207Updated 2 years ago
- ☆49Updated 2 years ago