arcprizeorg / model_baseline
Testing baseline LLMs performance across various models
☆200Updated 3 weeks ago
Alternatives and similar repositories for model_baseline:
Users that are interested in model_baseline are comparing it to the libraries listed below
- ☆96Updated 3 months ago
- Long context evaluation for large language models☆195Updated this week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆154Updated this week
- smolLM with Entropix sampler on pytorch☆147Updated 2 months ago
- ☆97Updated 3 weeks ago
- smol models are fun too☆86Updated 2 months ago
- Training Large Language Model to Reason in a Continuous Latent Space☆565Updated this week
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆298Updated 3 months ago
- Simple Transformer in Jax☆128Updated 6 months ago
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆195Updated 3 weeks ago
- WIP - Allows you to create DSPy pipelines using ComfyUI☆184Updated last month
- GRadient-INformed MoE☆261Updated 3 months ago
- System 2 Reasoning Link Collection☆723Updated this week
- Fast parallel LLM inference for MLX☆152Updated 6 months ago
- Synthetic Data curation for post-training and structured data extraction☆351Updated this week
- Code and Data for Tau-Bench☆254Updated last week
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆60Updated 2 months ago
- Draw more samples☆184Updated 6 months ago
- ☆434Updated 3 months ago
- End-to-end Generative Optimization for AI Agents☆446Updated this week
- ☆94Updated 3 months ago
- Recipes to scale inference-time compute of open models☆945Updated this week
- A library for making RepE control vectors☆530Updated last week
- ☆241Updated last month
- Banishing LLM Hallucinations Requires Rethinking Generalization☆269Updated 6 months ago
- llm-consortium orchestrates mulitple LLMs, iteratively refines & achieves consensus.☆130Updated this week
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆572Updated this week
- ☆135Updated last month
- A simple unified framework for evaluating LLMs☆166Updated this week
- Solving data for LLMs - Create quality synthetic datasets!☆143Updated 3 months ago