NousResearch / Open-Reasoning-TasksLinks

A comprehensive repository of reasoning tasks for LLMs (and beyond)

☆448

Alternatives and similar repositories for Open-Reasoning-Tasks

Users that are interested in Open-Reasoning-Tasks are comparing it to the libraries listed below

Sorting:

NousResearch / atropos
Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …
☆573Updated this week
xjdr-alt / entropix-local
smol models are fun too
☆92Updated 8 months ago
migtissera / Sensei
Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI
☆222Updated last year
willccbb / mlx_parallm
Fast parallel LLM inference for MLX
☆204Updated last year
aidanmclaughlin / AidanBench
Aidan Bench attempts to measure <big_model_smell> in LLMs.
☆307Updated last month
PrimeIntellect-ai / genesys
☆130Updated 4 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
magicproduct / hash-hop
Long context evaluation for large language models
☆220Updated 5 months ago
teknium1 / Prompt-Engineering-Toolkit
☆412Updated 11 months ago
haizelabs / verdict
Inference-time scaling for LLMs-as-a-judge.
☆267Updated 3 weeks ago
haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆108Updated 3 months ago
jerber / lang-jepa
☆118Updated 7 months ago
Mihaiii / llm_steer
Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…
☆240Updated 5 months ago
doomslide / hyperobject
Plotting (entropy, varentropy) for small LMs
☆98Updated 2 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆103Updated 5 months ago
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated last year
simple-bench / SimpleBench
☆138Updated 7 months ago
anyscale / llm-router
Tutorial for building LLM router
☆221Updated last year
EQ-bench / EQ-Bench
A benchmark for emotional intelligence in large language models
☆336Updated last year
cohere-ai / cohere-terrarium
A simple Python sandbox for helpful LLM data agents
☆277Updated last year
normal-computing / extended-mind-transformers
☆123Updated last year
SinatrasC / entropix-smollm
smolLM with Entropix sampler on pytorch
☆150Updated 9 months ago
Pints-AI / 1.5-Pints
A compact LLM pretrained in 9 days by using high quality data
☆320Updated 3 months ago
arcprize / arc-agi-benchmarking
Testing baseline LLMs performance across various models
☆291Updated last week
lamini-ai / Lamini-Memory-Tuning
Banishing LLM Hallucinations Requires Rethinking Generalization
☆276Updated last year
huggingface / yourbench
🤗 Benchmark Large Language Models Reliably On Your Data
☆381Updated this week
holo-q / OpenQ
The open-source implementation of Q*, achieved in context as a zero-shot reprogramming of the attention mechanism. (synthetic data)
☆1Updated 7 months ago
togethercomputer / finetuning
Finetune Llama-3-8b on the MathInstruct dataset
☆111Updated 9 months ago
QuixiAI / OpenChatML
☆157Updated last year
Ziems / arbor
A framework for optimizing DSPy programs with RL
☆96Updated this week