arcprizeorg / model_baseline

Testing baseline LLMs performance across various models

☆229

Alternatives and similar repositories for model_baseline:

Users that are interested in model_baseline are comparing it to the libraries listed below

doomslide / hyperobject
☆97Updated 5 months ago
NousResearch / Open-Reasoning-Tasks
A comprehensive repository of reasoning tasks for LLMs (and beyond)
☆423Updated 5 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆168Updated 2 months ago
ekinakyurek / marc
Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"
☆300Updated 4 months ago
SWE-Gym / SWE-Gym
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym
☆402Updated 2 weeks ago
xjdr-alt / entropix-local
smol models are fun too
☆89Updated 4 months ago
aidanmclaughlin / AidanBench
Aidan Bench attempts to measure <big_model_smell> in LLMs.
☆288Updated 2 weeks ago
jerber / lang-jepa
☆105Updated 3 months ago
nuwandavek / karpathify
☆96Updated 5 months ago
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆186Updated 9 months ago
willccbb / mlx_parallm
Fast parallel LLM inference for MLX
☆174Updated 8 months ago
SakanaAI / evo-memory
Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.
☆299Updated 5 months ago
neoneye / ARC-Interactive-History-Dataset
The history files when recording human interaction while solving ARC tasks
☆97Updated last week
smolorg / smoltropix
MLX port for xjdr's entropix sampler (mimics jax implementation)
☆63Updated 4 months ago
agora-protocol / paper-demo
☆144Updated 2 weeks ago
SinatrasC / entropix-smollm
smolLM with Entropix sampler on pytorch
☆150Updated 4 months ago
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated 7 months ago
rgreenblatt / arc_draw_more_samples_pub
Draw more samples
☆186Updated 9 months ago
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆307Updated 3 months ago