seacowx / OpenToMLinks

The official repository of the OpenToM dataset

☆23

Alternatives and similar repositories for OpenToM

Users that are interested in OpenToM are comparing it to the libraries listed below

Sorting:

LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆70Updated 11 months ago
oriyor / assistantbench
Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"
☆56Updated 6 months ago
dinobby / MAgICoRE
☆24Updated 8 months ago
swarnaHub / ExplanationIntervention
[NeurIPS 2023] PyTorch code for Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mind
☆67Updated last year
zbambergerNLP / strategic-debate-tot
A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments
☆81Updated 8 months ago
allenai / clin
☆82Updated last year
tianyang-x / SaySelf
Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"
☆106Updated 8 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆87Updated 8 months ago
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆60Updated last month
OSU-NLP-Group / Middleware
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)
☆36Updated 5 months ago
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated 10 months ago
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆89Updated last week
orionw / promptriever
The first dense retrieval model that can be prompted like an LM
☆73Updated last month
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆57Updated 9 months ago
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆49Updated 10 months ago
SALT-NLP / demonstrated-feedback
☆121Updated 8 months ago
yueqis / API-Based-Agent
☆50Updated last week
zou-group / sirius
SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning
☆54Updated 2 months ago
reasoning-machines / prompt-lib
A set of utilities for running few-shot prompting experiments on large-language models
☆121Updated last year
suzgunmirac / dynamic-cheatsheet
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
☆62Updated 2 weeks ago
zjunlp / ModelKinship
Exploring Model Kinship for Merging Large Language Models
☆24Updated last month
Tebmer / Rereading-LLM-Reasoning
EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…
☆26Updated 5 months ago
SALT-NLP / collaborative-gym
Framework and toolkits for building and evaluating collaborative agents that can work together with humans.
☆81Updated 2 months ago
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆56Updated last month
JacobPfau / fillerTokens
☆61Updated last year
OpenMOSS / Lorsa
☆19Updated this week
thomasgauthier / LLM-self-play
Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)
☆29Updated last year
allenai / infinigram-api
☆59Updated this week
jiangjiechen / auction-arena
Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…
☆44Updated last year
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆136Updated 6 months ago