eric-prog / GPU-GrantsLinks
GPUGrants - a list of GPU grants that I can think of
☆61Updated 4 months ago
Alternatives and similar repositories for GPU-Grants
Users that are interested in GPU-Grants are comparing it to the libraries listed below
Sorting:
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆188Updated 2 months ago
- An extension of the nanoGPT repository for training small MOE models.☆226Updated 10 months ago
- ☆29Updated last year
- Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University☆289Updated this week
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆198Updated last year
- Distributed training (multi-node) of a Transformer model☆91Updated last year
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆150Updated 3 months ago
- A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).☆160Updated last year
- ⏰ AI conference deadline countdowns☆317Updated this week
- Open source interpretability artefacts for R1.☆167Updated 9 months ago
- Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.☆31Updated 8 months ago
- ☆45Updated 7 months ago
- Prune transformer layers☆74Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆86Updated 4 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆185Updated 6 months ago
- ☆114Updated 4 months ago
- nanoGPT-like codebase for LLM training☆114Updated 2 months ago
- Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.☆140Updated last year
- ☆38Updated 11 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆175Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆352Updated 8 months ago
- A brief and partial summary of RLHF algorithms.☆142Updated 10 months ago
- Normalized Transformer (nGPT)☆196Updated last year
- minimal GRPO implementation from scratch☆102Updated 10 months ago
- LLM-Merging: Building LLMs Efficiently through Merging☆208Updated last year
- Understand and test language model architectures on synthetic tasks.☆249Updated last week
- ☆228Updated last year
- LoRA and DoRA from Scratch Implementations☆215Updated last year
- Collection of autoregressive model implementation☆85Updated last week
- An introduction to LLM Sampling☆79Updated last year