Beating the GAIA benchmark with Transformers Agents. π
β149Feb 19, 2025Updated last year
Alternatives and similar repositories for GAIA
Users that are interested in GAIA are comparing it to the libraries listed below
Sorting:
- π§ Compare how Agent systems perform on several benchmarks. ππβ103Aug 4, 2025Updated 7 months ago
- β15Jun 2, 2025Updated 9 months ago
- A framework for few-shot evaluation of autoregressive language models.β12Jul 14, 2025Updated 7 months ago
- β18Jul 3, 2025Updated 8 months ago
- β11Oct 11, 2023Updated 2 years ago
- β13Apr 30, 2024Updated last year
- A modular graph-based Retrieval-Augmented Generation (RAG) systemβ14Feb 28, 2026Updated last week
- [ICLR-2026] Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".β32Feb 26, 2026Updated last week
- β15Jun 8, 2023Updated 2 years ago
- A Framework For Intelligence Farmingβ16Apr 3, 2025Updated 11 months ago
- Lens extension, which helps to debug podsβ13Jul 25, 2023Updated 2 years ago
- Repository for the ICML 2021 paper: https://arxiv.org/abs/2103.04886β13Jan 24, 2022Updated 4 years ago
- Dev and Test Data of LogicGame benchmarkβ19Mar 31, 2025Updated 11 months ago
- text2sql with modern LLMs (duckdb-nsql, SQLCoder etc ...)β18Apr 13, 2024Updated last year
- Fast, simple tool to concatenate Git repositories into single files for LLM analysisβ20Mar 2, 2025Updated last year
- An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through inβ¦β803Oct 4, 2025Updated 5 months ago
- β21Sep 6, 2021Updated 4 years ago
- The Elasticsearch adapter for Microsoft Kernel Memory.β19Aug 1, 2024Updated last year
- Small, simple agent task environments for training and evaluationβ19Nov 1, 2024Updated last year
- Dataset Viber is your chill repo for data collection, annotation and vibe checks.β47Sep 5, 2024Updated last year
- β19Nov 8, 2024Updated last year
- Code for explaining and evaluating late chunking (chunked pooling)β490Dec 23, 2024Updated last year
- II-Thought-RL is our initial attempt at developing a large-scale, multi-domain Reinforcement Learning (RL) datasetβ31Apr 8, 2025Updated 10 months ago
- This repository to demonstrate an application built with Java 21 + SrpingBoot 3 + MyBatis including CRUD operations, authentication, routβ¦β12Dec 1, 2024Updated last year
- BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructionsβ25Aug 8, 2024Updated last year
- A family of compressed models obtained via pruning and knowledge distillationβ368Nov 6, 2025Updated 4 months ago
- Learn to build real-world applications with Claude Agent SDK. This tutorial takes you from basics to advanced implementations, covering β¦β139Feb 19, 2026Updated 2 weeks ago
- AgentTuning: Enabling Generalized Agent Abilities for LLMsβ1,479Oct 31, 2023Updated 2 years ago
- WebLINX is a benchmark for building web navigation agents with conversational capabilitiesβ160Feb 11, 2025Updated last year
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performanceβ¦β156Apr 7, 2025Updated 11 months ago
- Code for SaGe subword tokenizer (EACL 2023)β27Nov 30, 2024Updated last year
- β26May 18, 2024Updated last year
- β104Jul 17, 2024Updated last year
- a python package for loadimg and converting imagesβ29Feb 18, 2026Updated 2 weeks ago
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welcβ¦β23Mar 4, 2024Updated 2 years ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS β¦β60Oct 11, 2024Updated last year
- β159Aug 27, 2024Updated last year
- β132May 8, 2025Updated 9 months ago
- Come hack with the Semantic Kernel v1.0 SDK!β25Dec 14, 2023Updated 2 years ago