eric-prog / GPU-GrantsLinks
GPUGrants - a list of GPU grants that I can think of
☆52Updated 3 months ago
Alternatives and similar repositories for GPU-Grants
Users that are interested in GPU-Grants are comparing it to the libraries listed below
Sorting:
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆149Updated 2 months ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆186Updated last month
- An extension of the nanoGPT repository for training small MOE models.☆219Updated 9 months ago
- ⏰ AI conference deadline countdowns☆294Updated this week
- Open source interpretability artefacts for R1.☆165Updated 8 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆180Updated 5 months ago
- A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).☆161Updated 11 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆349Updated 7 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆174Updated 11 months ago
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆85Updated 3 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆198Updated last year
- ☆45Updated 7 months ago
- ☆37Updated 10 months ago
- Collection of autoregressive model implementation☆85Updated 8 months ago
- LoRA and DoRA from Scratch Implementations☆214Updated last year
- Code for studying the super weight in LLM☆121Updated last year
- Understand and test language model architectures on synthetic tasks.☆246Updated 2 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆234Updated 5 months ago
- Prune transformer layers☆74Updated last year
- Evaluation of LLMs on latest math competitions☆205Updated this week
- MatFormer repo☆66Updated last year
- ☆113Updated 3 months ago
- minimal GRPO implementation from scratch☆100Updated 9 months ago
- Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.☆139Updated last year
- LLM-Merging: Building LLMs Efficiently through Merging☆207Updated last year
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆327Updated last month
- code for training & evaluating Contextual Document Embedding models☆201Updated 7 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆63Updated last year
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆179Updated last year