devichand579 / HPTLinks
code for Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles
β23Updated 2 months ago
Alternatives and similar repositories for HPT
Users that are interested in HPT are comparing it to the libraries listed below
Sorting:
- β67Updated 6 months ago
- ππ§ Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!β54Updated 3 months ago
- β56Updated 3 months ago
- Open Implementations of LLM Analysesβ107Updated last year
- β55Updated 11 months ago
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"β57Updated 7 months ago
- Nexusflow function call, tool use, and agent benchmarks.β29Updated 10 months ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding forβ¦β27Updated 10 months ago
- KV Cache Steering for Inducing Reasoning in Small Language Modelsβ40Updated 2 months ago
- β40Updated 9 months ago
- Data preparation code for CrystalCoder 7B LLMβ45Updated last year
- β60Updated 10 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)β91Updated 8 months ago
- β11Updated 11 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Modelsβ¦β37Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ60Updated last year
- β50Updated last year
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrievalβ30Updated 2 months ago
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agentsβ32Updated this week
- β30Updated last year
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)β87Updated last month
- Multi-Granularity LLM Debuggerβ91Updated 3 months ago
- OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System.β18Updated 11 months ago
- β25Updated 4 months ago
- Small, simple agent task environments for training and evaluationβ18Updated 11 months ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.β93Updated 4 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data generaβ¦β96Updated this week
- β28Updated 6 months ago
- β31Updated last year
- Codebase accompanying the Summary of a Haystack paper.β79Updated last year