HammingHQ / bug-in-the-code-stackLinks
A new benchmark for measuring LLM's capability to detect bugs in large codebase.
☆30Updated last year
Alternatives and similar repositories for bug-in-the-code-stack
Users that are interested in bug-in-the-code-stack are comparing it to the libraries listed below
Sorting:
- Train your own SOTA deductive reasoning model☆93Updated 3 months ago
- Just a bunch of benchmark logs for different LLMs☆119Updated 10 months ago
- ☆59Updated 2 weeks ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆90Updated 4 months ago
- Synthetic Data for LLM Fine-Tuning☆118Updated last year
- ☆48Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆53Updated 4 months ago
- look how they massacred my boy☆63Updated 7 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆67Updated 2 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- ☆89Updated 8 months ago
- ☆49Updated 7 months ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆77Updated 3 months ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆82Updated last year
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆74Updated last week
- ☆86Updated 8 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆38Updated last month
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆70Updated 7 months ago
- ☆33Updated 3 months ago
- ☆66Updated last year
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆72Updated 5 months ago
- A strongly typed Python DSL for developing message passing multi agent systems