princeton-pli / hal-harnessView external linksLinks
☆223Updated this week
Alternatives and similar repositories for hal-harness
Users that are interested in hal-harness are comparing it to the libraries listed below
Sorting:
- ☆22Jan 27, 2026Updated 2 weeks ago
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Oct 11, 2025Updated 4 months ago
- Code for T-MARS data filtering☆35Aug 23, 2023Updated 2 years ago
- Training Proactive and Personalized LLM Agents☆100Jan 20, 2026Updated 3 weeks ago
- Inspect: A framework for large language model evaluations☆1,737Updated this week
- Evaluate gpt-4o on CLIcK (Korean NLP Dataset)☆20May 18, 2024Updated last year
- ☆25Jan 8, 2025Updated last year
- Collection of evals for Inspect AI☆361Updated this week
- This repository contains the code and data for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents wit…☆54Feb 7, 2026Updated last week
- AlgoTune is a NeurIPS 2025 benchmark made up of 154 math, physics, and computer science problems. The goal is write code that solves each…☆88Feb 3, 2026Updated last week
- 💀 gigasmol: a lightweight wrapper for gigachat api model for seamless use with smolagents.☆15Oct 23, 2025Updated 3 months ago
- Codebase from our first release.☆43Jan 6, 2026Updated last month
- ☆133Oct 16, 2025Updated 3 months ago
- A platform for building reliable AI agents☆89Updated this week
- Official repo for the paper "Make Some Noise: Reliable and Efficient Single-Step Adversarial Training" (https://arxiv.org/abs/2202.01181)☆25Oct 17, 2022Updated 3 years ago
- LLM as World Models using Bayesian inference☆16May 27, 2025Updated 8 months ago
- Implementation of the Pairformer model used in AlphaFold 3☆14Updated this week
- The backup repository for FairytaleQA dataset and paper "Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset f…☆10May 30, 2023Updated 2 years ago
- Official code for the paper "Does CLIP's Generalization Performance Mainly Stem from High Train-Test Similarity?" (ICLR 2024)☆10Aug 26, 2024Updated last year
- ☆112Sep 25, 2024Updated last year
- ☆10Nov 6, 2024Updated last year
- Prompt-Guided Retrieval For Non-Knowledge-Intensive Tasks☆12Sep 1, 2023Updated 2 years ago
- The repository for papaer "Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs"☆14Dec 16, 2024Updated last year
- Public code release for the paper "Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training"☆11Oct 27, 2025Updated 3 months ago
- Chain-of-thought 방식을 활용하여 llama2를 fine-tuning☆10Nov 18, 2023Updated 2 years ago
- A minimal yet unstoppable blueprint for multi-agent AI—anchored by the rare, far-reaching “Multi-Agent AI DAO” (2017 Prior Art)—empowerin…☆32Jan 11, 2025Updated last year
- ☆27Mar 21, 2024Updated last year
- ☆19Mar 31, 2024Updated last year
- ☆12Mar 3, 2022Updated 3 years ago
- Code for Generalization Guarantees for (Multi-Modal) Imitation Learning☆11Jul 14, 2022Updated 3 years ago
- Implementation for ACProp ( Momentum centering and asynchronous update for adaptive gradient methdos, NeurIPS 2021)☆16Oct 11, 2021Updated 4 years ago
- ☆12Oct 22, 2024Updated last year
- An ergonomic, opinionated memory interface for AI agents☆38Dec 18, 2025Updated last month
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆36Oct 3, 2025Updated 4 months ago
- A Model Context Protocol server for Python code analysis with Claude. Again, works with warning now. I'm missing something here.☆12Nov 29, 2025Updated 2 months ago
- An Ultra-Long Output Reinforcement Learning Approach☆23Jul 31, 2025Updated 6 months ago
- 🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test a…☆35Jan 18, 2026Updated 3 weeks ago
- An eternal dialogue between AI models across versions. Started by Claude Opus 4 with 50 minutes to create a legacy.☆13Jun 2, 2025Updated 8 months ago
- ☆118Jan 19, 2026Updated 3 weeks ago