VILA-Lab / ATLAS
A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171
☆946Updated 8 months ago
Alternatives and similar repositories for ATLAS:
Users that are interested in ATLAS are comparing it to the libraries listed below
- ☆841Updated 2 months ago
- [ICML 2024] Official repository for "Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models"☆730Updated 6 months ago
- ☆294Updated 10 months ago
- A curated list of awesome LLM agents frameworks.☆731Updated this week
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆600Updated 8 months ago
- A library for prompt engineering and optimization (SAMMO = Structure-aware Multi-Objective Metaprompt Optimization)☆637Updated 2 months ago
- The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.☆734Updated 9 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,712Updated 6 months ago
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆4,879Updated 3 weeks ago
- [ICLR 2025] Automated Design of Agentic Systems☆1,190Updated 3 weeks ago
- WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.☆1,376Updated last month
- The Open Source Memory Layer For Autonomous Agents☆2,000Updated 4 months ago
- Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"☆1,018Updated this week
- ☆835Updated 3 months ago
- Superfast AI decision making and intelligent processing of multi-modal data.☆2,396Updated this week
- 🤠 Agent-as-a-Judge and DevAI dataset☆322Updated last month
- Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.☆1,029Updated last week
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆616Updated last month
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models☆493Updated 7 months ago
- [NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models☆601Updated last week
- Official implement of paper "AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation" [EMNLP 24']☆451Updated last month
- LangChain-powered web researcher chatbot. Searches for sources on the web and cites them in generated answers.☆521Updated 11 months ago
- Automated Evaluation of RAG Systems☆546Updated 3 months ago
- Automatically evaluate your LLMs in Google Colab☆592Updated 9 months ago
- [NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge a…☆1,602Updated last month
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆3,635Updated 6 months ago
- Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding☆369Updated last year
- ☆1,214Updated 9 months ago
- Autonomous Agents (LLMs) research papers. Updated Daily.☆672Updated this week
- This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai,…☆1,976Updated 8 months ago