samkhur006 / awesome-llm-planning-reasoning
A curated collection of LLM reasoning and planning resources, including key papers, limitations, benchmarks, and additional learning materials.
☆148Updated 3 weeks ago
Related projects: ⓘ
- Code for the paper 🌳 Tree Search for Language Model Agents☆124Updated last month
- AWM: Agent Workflow Memory☆121Updated last week
- Code for the paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆140Updated 3 months ago
- A simple unified framework for evaluating LLMs☆121Updated this week
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆218Updated 5 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆107Updated 2 weeks ago
- Codebase accompanying the Summary of a Haystack paper.☆65Updated 2 months ago
- Functional Benchmarks and the Reasoning Gap☆74Updated last month
- ☆224Updated 3 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆182Updated last month
- ☆105Updated this week
- The official evaluation suite and dynamic data release for MixEval.☆200Updated last week
- Evaluating LLMs with CommonGen-Lite☆83Updated 6 months ago
- Codes and Data for ACL 2024 Paper "Faithful Logical Reasoning via Symbolic Chain-of-Thought".☆143Updated 2 months ago
- Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"☆155Updated 2 months ago
- GPQA: A Graduate-Level Google-Proof Q&A Benchmark☆139Updated 5 months ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆82Updated 2 months ago
- [Preprint] Learning to Filter Context for Retrieval-Augmented Generaton☆182Updated 5 months ago
- Official implementation for the paper "LongEmbed: Extending Embedding Models for Long Context Retrieval"☆108Updated 4 months ago
- A benchmark that challenges language models to code solutions for scientific problems☆69Updated this week
- awesome synthetic (text) datasets☆213Updated last week
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆81Updated last month
- ☆77Updated last month
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆73Updated 2 months ago
- A pipeline for LLM knowledge distillation☆68Updated last month
- Expert Specialized Fine-Tuning☆129Updated last month
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆86Updated 3 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆117Updated this week
- ☆85Updated 7 months ago
- Just a bunch of benchmark logs for different LLMs☆112Updated last month