Kaggle / kaggle-benchmarksLinks
Kaggle Benchmarks Python library
☆64Updated this week
Alternatives and similar repositories for kaggle-benchmarks
Users that are interested in kaggle-benchmarks are comparing it to the libraries listed below
Sorting:
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆261Updated last week
- Official Repo for CRMArena and CRMArena-Pro☆132Updated this week
- Training setup for Langchain's Open Deep Research☆75Updated 5 months ago
- Enterprise-grade memory framework for LLMs featuring GPU-optimized inference, vector storage, and automated scaling. Enables hyper-person…☆91Updated 9 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆112Updated 9 months ago
- A method for steering llms to better follow instructions☆78Updated 6 months ago
- you.com's framework for evaluating deep research systems.☆67Updated 8 months ago
- Fastest way to build and deploy reliable AI agents, MCP tools and agent-to-agent. Deploy in a production ready serverless environment.☆147Updated this week
- ☆19Updated 6 months ago
- ☆80Updated 4 months ago
- ☆43Updated last year
- Optimizing Dynamic Knowledge Base Using AI Agent☆87Updated 5 months ago
- LLM reads a paper and produce a working prototype☆60Updated 9 months ago
- Recipes and resources for building, deploying, and fine-tuning generative AI with Fireworks.☆134Updated 2 weeks ago
- ☆181Updated 11 months ago
- Streamline on-policy/off-policy distillation workflows in a few lines of code☆95Updated this week
- Ranking LLMs on agentic tasks☆211Updated 2 months ago
- ☆107Updated 10 months ago
- EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other la…☆94Updated 2 months ago
- Scaling Coding-Agent RL to 32x H100s. **Achieving 160% improvement** on Stanford's TerminalBench☆92Updated 3 months ago
- ☆39Updated last year
- Public repository containing METR's DVC pipeline for eval data analysis☆199Updated last week
- Example code using the DSPy framework.☆20Updated last year
- Evals that meet you where you are. For AI that's grounded.☆45Updated this week
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆80Updated last year
- Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆56Updated 6 months ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆41Updated 10 months ago
- ☆36Updated 9 months ago
- a1facts - the precision layer for AI agents☆64Updated 4 months ago