☆237Mar 5, 2026Updated this week
Alternatives and similar repositories for hal-harness
Users that are interested in hal-harness are comparing it to the libraries listed below
Sorting:
- ☆19Jul 31, 2025Updated 7 months ago
- ☆23Jan 27, 2026Updated last month
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Oct 11, 2025Updated 4 months ago
- Code for T-MARS data filtering☆35Aug 23, 2023Updated 2 years ago
- BenchBench is a Python package to evaluate multi-task benchmarks.☆18Oct 12, 2025Updated 4 months ago
- Training Proactive and Personalized LLM Agents☆102Jan 20, 2026Updated last month
- Inspect: A framework for large language model evaluations☆1,800Updated this week
- Evaluate gpt-4o on CLIcK (Korean NLP Dataset)☆20May 18, 2024Updated last year
- ☆25Jan 8, 2025Updated last year
- ☆80Updated this week
- Collection of evals for Inspect AI☆393Updated this week
- This repository contains the code and data for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents wit…☆55Feb 7, 2026Updated last month
- 💀 gigasmol: a lightweight wrapper for gigachat api model for seamless use with smolagents.☆15Oct 23, 2025Updated 4 months ago
- AlgoTune is a NeurIPS 2025 benchmark made up of 154 math, physics, and computer science problems. The goal is write code that solves each…☆92Feb 21, 2026Updated 2 weeks ago
- Codebase from our first release.☆45Feb 17, 2026Updated 2 weeks ago
- ☆133Oct 16, 2025Updated 4 months ago
- Official repo for the paper "Make Some Noise: Reliable and Efficient Single-Step Adversarial Training" (https://arxiv.org/abs/2202.01181)☆25Oct 17, 2022Updated 3 years ago
- ☆112Sep 25, 2024Updated last year
- The backup repository for FairytaleQA dataset and paper "Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset f…☆10May 30, 2023Updated 2 years ago
- Official code for the paper "Does CLIP's Generalization Performance Mainly Stem from High Train-Test Similarity?" (ICLR 2024)☆10Aug 26, 2024Updated last year
- Public code release for the paper "Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training"☆11Oct 27, 2025Updated 4 months ago
- The repository for papaer "Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs"☆14Dec 16, 2024Updated last year
- Chain-of-thought 방식을 활용하여 llama2를 fine-tuning☆10Nov 18, 2023Updated 2 years ago
- Sequence Tagging for Biomedical Extractive Question Answering (Bioinformatics'2020)☆11Jul 3, 2023Updated 2 years ago
- Implementation of the Pairformer model used in AlphaFold 3☆14Mar 2, 2026Updated last week
- ☆10Nov 6, 2024Updated last year
- Prompt-Guided Retrieval For Non-Knowledge-Intensive Tasks☆12Sep 1, 2023Updated 2 years ago
- ☆18Jan 15, 2026Updated last month
- LLM as World Models using Bayesian inference☆16May 27, 2025Updated 9 months ago
- ☆27Mar 21, 2024Updated last year
- A minimal yet unstoppable blueprint for multi-agent AI—anchored by the rare, far-reaching “Multi-Agent AI DAO” (2017 Prior Art)—empowerin…☆32Jan 11, 2025Updated last year
- Async RL Training at Scale☆1,107Updated this week
- [ICLR 2025] 🚀 CodeMMLU Evaluator: A framework for evaluating LM models on CodeMMLU MCQs benchmark.☆29Apr 21, 2025Updated 10 months ago
- Implementation for ACProp ( Momentum centering and asynchronous update for adaptive gradient methdos, NeurIPS 2021)☆16Oct 11, 2021Updated 4 years ago
- 🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test a…☆35Jan 18, 2026Updated last month
- ☆10Oct 24, 2024Updated last year
- Code for Generalization Guarantees for (Multi-Modal) Imitation Learning☆11Jul 14, 2022Updated 3 years ago
- An Ultra-Long Output Reinforcement Learning Approach☆23Jul 31, 2025Updated 7 months ago
- ☆11Sep 19, 2025Updated 5 months ago