[COLM 2025] EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
☆31Jul 11, 2025Updated 7 months ago
Alternatives and similar repositories for EvalTree
Users that are interested in EvalTree are comparing it to the libraries listed below
Sorting:
- Fluid Language Model Benchmarking☆26Sep 16, 2025Updated 5 months ago
- Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…☆22Nov 2, 2021Updated 4 years ago
- Python package for serving a local search engine. One command to download and serve a datastore---that's it 😎.☆25Jun 6, 2025Updated 9 months ago
- A comprehensive React Native starter template built with Expo. It includes reusable UI components, Poppins font setup, NativeWind, Fireba…☆23Updated this week
- ☆33Updated this week
- A simple lightweight Model Context Protocol (MCP) server integration framework☆17Jan 23, 2026Updated last month
- ☆29Oct 24, 2025Updated 4 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Apr 20, 2025Updated 10 months ago
- AuraMatrix is personality analysis web which using llm to do evaluation. I have made this for Gyanotsav-2025 to show different ways to ut…☆11Dec 22, 2025Updated 2 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- Structured TRIZ prompt engineering for LLMs in an open, portable XML format – MIT licensed.☆16Nov 11, 2025Updated 3 months ago
- ☆37Jan 26, 2025Updated last year
- ☆27Jun 12, 2023Updated 2 years ago
- VibEx (vx) is a developer-friendly CLI tool that streamlines the process of working with AI coding assistants. It helps developers prepar…☆28May 17, 2025Updated 9 months ago
- MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces☆10Mar 24, 2025Updated 11 months ago
- CoachLint is your AI coding coach. It guides you through errors instead of just solving them for you.☆23Nov 20, 2025Updated 3 months ago
- ☆29Updated this week
- 🪝PISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Models☆12May 30, 2025Updated 9 months ago
- Glitch Gremlin AI☆15Apr 5, 2025Updated 11 months ago
- 💀 gigasmol: a lightweight wrapper for gigachat api model for seamless use with smolagents.☆15Oct 23, 2025Updated 4 months ago
- "Open-source toolkit (Python Library, Registry API, CLI) for secure, decentralized AI agent interoperability using A2A/MCP."☆14May 10, 2025Updated 9 months ago
- SYSTEM PROMPT TRANSPARENCY FOR ALL☆12May 22, 2025Updated 9 months ago
- [NeurIPS 2025] Official Implementation of paper "Sherlock: Self-Correcting Reasoning in Vision-Language Models"☆28Sep 18, 2025Updated 5 months ago
- ☆14Apr 4, 2025Updated 11 months ago
- ☆17Dec 16, 2025Updated 2 months ago
- 📱 A template for your next React Native project: Expo, TypeScript, ReStyle, Husky, react-navigation, react-query, react-hook-form, zusta…☆16Dec 15, 2025Updated 2 months ago
- AI Tasks. A LLM integrated agent orchestration tool for Liferay.☆14May 16, 2025Updated 9 months ago
- Reference implementation of algorithms for reinforcement learning and Markov decision processes.☆12Jan 28, 2021Updated 5 years ago
- OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents☆21Jan 6, 2026Updated 2 months ago
- An open source deep research clone. AI Agent (Local LLM or Gemini) that reasons large amounts of web data extracted with SwiftSoup.☆13Feb 10, 2025Updated last year
- IBM watsonx Code Assistant for Red Hat Ansible Lightspeed demystifies the process of Ansible Playbook creation through generative AI-powe…☆19Sep 18, 2025Updated 5 months ago
- Emphasizes AI-based projects for various companies.☆15Apr 1, 2025Updated 11 months ago
- Shakey OS Mobile AI Framework for React Native allowing people to build React Native apps for IOS and Android with AI tooling and wallet …☆28Feb 3, 2025Updated last year
- Pascal2 Harvest project QuEst☆14Sep 15, 2014Updated 11 years ago
- Rationales for Sequential Predictions☆40Mar 10, 2022Updated 3 years ago
- ☆40May 2, 2021Updated 4 years ago
- Code for Massive-scale Decoding for Text Generation using Lattices☆44Jul 29, 2022Updated 3 years ago
- Code for "Tracing Knowledge in Language Models Back to the Training Data"☆39Dec 27, 2022Updated 3 years ago
- A small go harness that uses Ollama to orchestrate LLMs in a restricted process flow☆16Sep 10, 2024Updated last year