allenai / codenavLinks
CodeNav is an LLM agent that navigates and leverages previously unseen code repositories to solve user queries.
☆65Updated last year
Alternatives and similar repositories for codenav
Users that are interested in codenav are comparing it to the libraries listed below
Sorting:
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆69Updated last year
- Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs☆51Updated last year
- ☆41Updated last year
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆96Updated 8 months ago
- Evaluating LLMs with fewer examples☆169Updated last year
- ☆105Updated last year
- ☆87Updated 2 years ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆115Updated 8 months ago
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆31Updated 10 months ago
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆249Updated 8 months ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆56Updated 6 months ago
- ☆28Updated 3 months ago
- ☆123Updated 11 months ago
- ☆35Updated 8 months ago
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆43Updated 2 years ago
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆66Updated 2 years ago
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆63Updated 4 months ago
- ☆61Updated 7 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆83Updated last year
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆58Updated 10 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆132Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆112Updated last year
- ☆132Updated 8 months ago
- Official Implementation of InstructZero; the first framework to optimize bad prompts of ChatGPT(API LLMs) and finally obtain good prompts…☆197Updated last year
- Small, simple agent task environments for training and evaluation☆19Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆175Updated last year
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆61Updated 9 months ago
- Multi-Granularity LLM Debugger [ICSE2026]☆96Updated 7 months ago