microsoft / lost_in_conversationLinks
Code that accompanies the public release of the paper Lost in Conversation (https://arxiv.org/abs/2505.06120)
☆125Updated 3 weeks ago
Alternatives and similar repositories for lost_in_conversation
Users that are interested in lost_in_conversation are comparing it to the libraries listed below
Sorting:
- Complex Function Calling Benchmark.☆112Updated 4 months ago
- Official Code Repository for the paper "Distilling LLM Agent into Small Models with Retrieval and Code Tools"☆101Updated this week
- RetroLLM: Empowering LLMs to Retrieve Fine-grained Evidence within Generation [ACL 2025]☆106Updated 4 months ago
- ☆83Updated 3 weeks ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆89Updated 3 weeks ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆91Updated 3 months ago
- Benchmarking Chat Assistants on Long-Term Interactive Memory (ICLR 2025)☆107Updated last month
- How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆35Updated last month
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆155Updated this week
- Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".☆164Updated this week
- A method for steering llms to better follow instructions☆45Updated last week
- Verifiers for LLM Reinforcement Learning☆56Updated last month
- ☆92Updated 2 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆208Updated last month
- ☆109Updated last month
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆62Updated 2 weeks ago
- Large language models for document ranking.☆54Updated 3 weeks ago
- ☆45Updated 2 weeks ago
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning☆203Updated this week
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆136Updated 7 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 8 months ago
- SWE Arena☆33Updated last month
- ☆45Updated 9 months ago
- Scaling Data for SWE-agents☆234Updated this week
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆182Updated 6 months ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆198Updated last month
- AWM: Agent Workflow Memory☆271Updated 4 months ago
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆54Updated 3 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆67Updated 2 months ago
- Reproducible, flexible LLM evaluations☆205Updated last month