Awesome AI Benchmarks
☆35Jan 16, 2026Updated 5 months ago
Alternatives and similar repositories for awesome-ai-benchmarks
Users that are interested in awesome-ai-benchmarks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LLM caching proxy server that emulates popular LLMs with the ability to simulate failures☆76Aug 4, 2025Updated 10 months ago
- Code repository for CISO agent as part of ITBench☆20May 8, 2025Updated last year
- Experiments on using ChatGPT for failure mode classification☆12Sep 20, 2023Updated 2 years ago
- Use an appropriate mix of LLMs based on https://nuenki.app/blog research to translate languages better than any one tool.☆27Jun 23, 2025Updated 11 months ago
- A simple, easy-to-customize pipeline for local RAG evaluation. Starter prompts and metric definitions included.☆24Jan 14, 2026Updated 5 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Federation over Text (FoT) is a federated-learning-like paradigm for multi-agent reasoning.☆108May 21, 2026Updated 3 weeks ago
- Survey paper: From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents.☆68Apr 3, 2026Updated 2 months ago
- Visually select, search, and copy your code into your clipboard for LLM context.☆26May 18, 2025Updated last year
- CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot☆11Jul 11, 2023Updated 2 years ago
- Code for PII detection and redaction in code datasets☆15Jan 24, 2023Updated 3 years ago
- ☆26Feb 28, 2025Updated last year
- FailureSensorIQ, a dataset and benchmark to probe LLMs’ reasoning and comprehension of sensor–failure relationships in industrial systems…☆44Jun 9, 2026Updated last week
- [AAAI'25] The implementation of paper "Federated Foundation Models on Heterogeneous Time Series" | The first work to explore time series …☆24May 10, 2026Updated last month
- Analyze Reddit posts☆32Jun 5, 2026Updated last week
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ICML'20: SIGUA: Forgetting May Make Learning with Noisy Labels More Robust☆17Dec 14, 2020Updated 5 years ago
- Create text chunks which end at natural stopping points without using a tokenizer☆26Nov 26, 2025Updated 6 months ago
- MLflow deployment plugin For IBM-cloud-watson-ml☆15May 7, 2025Updated last year
- A tool for adding function calling to llm api, available as a service by following the link☆22Aug 11, 2025Updated 10 months ago
- MI and Formal Verification of NNs on Algorithmic tasks!☆18Mar 18, 2024Updated 2 years ago
- ACPBench: Reasoning about Action, Change, and Planning. A benchmark designed to evaluate the fundamental reasoning abilities in the dom…☆33Feb 11, 2026Updated 4 months ago
- Efficient and readable change point detection package implemented in Python. (Singular Spectrum Transformation - SST, IKA-SST, ulSIF, RuL…☆35Jun 10, 2026Updated last week
- ⚠️ ARCHIVED - All development moved to https://github.com/itbench-hub/ITBench-CISO-SRE-FinOps-Agent☆21Sep 9, 2025Updated 9 months ago
- Lightning fast code searching made easy☆18Jul 20, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Emulating SAMSUNG HM641JI HDD firmware using Unicorn☆11Sep 19, 2022Updated 3 years ago
- ML models often mispredict, and it is hard to tell when and why. We present a data mining based approach to discover whether there is a c…☆17Jun 6, 2022Updated 4 years ago
- [KDD 2025] AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation☆34Nov 18, 2025Updated 7 months ago
- A bit-array manipulation library in C☆11Oct 29, 2021Updated 4 years ago
- This repository contains a PyTorch implementation of the paper "Hierarchical Graph Representation Learning for the Prediction of Drug-Tar…☆12Jul 21, 2022Updated 3 years ago
- ChatGPT CSS style☆14Apr 28, 2024Updated 2 years ago
- Source code for WWW 2021 paper "Lorentzian Graph Convolutional Networks"☆14Jun 11, 2021Updated 5 years ago
- The accompany backend for PAI app☆12Mar 24, 2025Updated last year
- PoC of injecting code into a running Linux process☆22Sep 11, 2019Updated 6 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆12May 30, 2025Updated last year
- Code for Knowledge-Adaptation Priors based on the NeurIPS 2021 paper by Khan and Swaroop.☆17Feb 1, 2022Updated 4 years ago
- A bot that provides Youtube vid chapters on Twitter (a.k.a. X )☆12Feb 5, 2025Updated last year
- [ICML25] CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale☆25Jul 31, 2025Updated 10 months ago
- Wallaby create-react-app TypeScript☆11Jan 3, 2023Updated 3 years ago
- In-Situ Evaluator: Real-Time Subsample Analysis☆15Jan 25, 2026Updated 4 months ago
- Better Encrypted Datastore is a library for securely storing encrypted data inside Datastore. In addition, the library extends Datastore'…☆13Mar 23, 2025Updated last year