arcprizeorg / model_baseline
Testing baseline LLMs performance across various models
☆229Updated this week
Alternatives and similar repositories for model_baseline:
Users that are interested in model_baseline are comparing it to the libraries listed below
- ☆97Updated 5 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆423Updated 5 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆168Updated 2 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆300Updated 4 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆402Updated 2 weeks ago
- smol models are fun too☆89Updated 4 months ago
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆288Updated 2 weeks ago
- ☆105Updated 3 months ago
- ☆96Updated 5 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆186Updated 9 months ago
- Fast parallel LLM inference for MLX☆174Updated 8 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆299Updated 5 months ago
- The history files when recording human interaction while solving ARC tasks☆97Updated last week
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆63Updated 4 months ago
- ☆144Updated 2 weeks ago
- smolLM with Entropix sampler on pytorch☆150Updated 4 months ago
- Just a bunch of benchmark logs for different LLMs☆119Updated 7 months ago
- Draw more samples☆186Updated 9 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆307Updated 3 months ago