McGill-NLP / nano-aha-moment
Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
โ119Updated this week
Alternatives and similar repositories for nano-aha-moment:
Users that are interested in nano-aha-moment are comparing it to the libraries listed below
- A MAD laboratory to improve AI architecture designs ๐งชโ108Updated 3 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ103Updated 4 months ago
- โ163Updated 3 weeks ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)โ187Updated 10 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersโ125Updated 4 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top ofโฆโ124Updated 7 months ago
- Implementation of ๐ฅฅ Coconut, Chain of Continuous Thought, in Pytorchโ162Updated 3 months ago
- โ76Updated 9 months ago
- โ74Updated 7 months ago
- โ111Updated last month
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.โ166Updated 3 weeks ago
- Normalized Transformer (nGPT)โ164Updated 4 months ago
- Replicating O1 inference-time scaling lawsโ83Updated 4 months ago
- Understand and test language model architectures on synthetic tasks.โ185Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.โ169Updated 2 months ago
- nanoGPT-like codebase for LLM trainingโ91Updated this week
- โ163Updated last month
- Simple and efficient pytorch-native transformer training and inference (batched)โ72Updated last year
- ๐พ OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.โ306Updated 2 weeks ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"โ148Updated 4 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingโ123Updated 11 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"โ229Updated 2 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindโ123Updated 7 months ago
- โ96Updated 9 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.โ59Updated 2 months ago
- Language models scale reliably with over-training and on downstream tasksโ96Updated last year
- โ87Updated 6 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"โ71Updated 5 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"โ224Updated last month
- EvaByte: Efficient Byte-level Language Models at Scaleโ85Updated 2 weeks ago