Blkalkin / Optimal-TestTime

☆11

Alternatives and similar repositories for Optimal-TestTime:

Users that are interested in Optimal-TestTime are comparing it to the libraries listed below

arcee-ai / DAM
☆48Updated 4 months ago
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆76Updated 6 months ago
kevinwu23 / StanfordFineTuneBench
☆27Updated 4 months ago
axeld5 / pali_reason
Testing paligemma2 finetuning on reasoning dataset
☆18Updated 3 months ago
davanstrien / data-for-fine-tuning-llms
☆76Updated 9 months ago
AnswerDotAI / ModernBERT-Instruct-mini-cookbook
☆38Updated last month
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆55Updated 7 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆39Updated last month
nahidalam / maya
Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya
☆107Updated last month
vdlad / Remarkable-Robustness-of-LLMs
Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"
☆17Updated 9 months ago
SeunghyunSEO / optimized_hf_llama_class_for_training
☆47Updated 7 months ago
orionw / promptriever
The first dense retrieval model that can be prompted like an LM
☆68Updated 6 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆84Updated 5 months ago
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated last month
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆85Updated last week
SalesforceAIResearch / LaTRO
☆111Updated last month
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆81Updated 3 weeks ago
ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆32Updated this week
KaiNylund / lm-weights-encode-time
☆67Updated 7 months ago
JacobPfau / fillerTokens
☆60Updated 11 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆56Updated last week
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆166Updated 3 weeks ago
withmartian / routerbench
The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System
☆111Updated 9 months ago
ivanleomk / kura
Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embedd…
☆93Updated 2 months ago
RobertCsordas / moeut
☆74Updated 7 months ago
benpry / why-think-step-by-step
Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"
☆59Updated last year
pgasawa / BARE
Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation
☆26Updated last month
AlexCuadron / ThinkingAgent
Systematic evaluation framework that automatically rates overthinking behavior in large language models.
☆82Updated this week
benediktstroebl / agent-evals
☆15Updated 6 months ago
AgnostiqHQ / multi-agent-llm
Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)
☆107Updated last month