allenai / clinLinks
☆86Updated last year
Alternatives and similar repositories for clin
Users that are interested in clin are comparing it to the libraries listed below
Sorting:
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆88Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆63Updated 10 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆146Updated 11 months ago
- ☆24Updated last year
- ☆128Updated last year
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆128Updated last year
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆155Updated 8 months ago
- A codebase for "Language Models can Solve Computer Tasks"☆235Updated last year
- ☆122Updated 8 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆151Updated 8 months ago
- [NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking☆269Updated last year
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆99Updated 2 years ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Updated 10 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆108Updated 3 months ago
- ☆144Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆109Updated 10 months ago
- ☆58Updated 4 months ago
- ☆41Updated last year
- ☆122Updated last year
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆117Updated 2 years ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆93Updated 5 months ago
- A benchmark for evaluating learning agents based on just language feedback☆90Updated 4 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 9 months ago
- ☆125Updated last year
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆132Updated last year
- Code for the paper 🌳 Tree Search for Language Model Agents☆217Updated last year
- ☆47Updated last year
- ☆35Updated 5 months ago
- [NeurIPS 2023] PyTorch code for Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mind☆66Updated last year