deepreinforce-ai / CUDA-L1Links
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
☆186Updated last month
Alternatives and similar repositories for CUDA-L1
Users that are interested in CUDA-L1 are comparing it to the libraries listed below
Sorting:
- Simple & Scalable Pretraining for Neural Architecture Research☆291Updated 3 weeks ago
- Work in progress.☆72Updated 2 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆129Updated 9 months ago
- Load compute kernels from the Hub☆283Updated this week
- PyTorch implementation of models from the Zamba2 series.☆185Updated 7 months ago
- The evaluation framework for training-free sparse attention in LLMs☆93Updated 2 months ago
- A collection of tricks and tools to speed up transformer models☆178Updated last week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆274Updated last month
- ☆242Updated 3 months ago
- DeMo: Decoupled Momentum Optimization☆190Updated 9 months ago
- 👷 Build compute kernels☆143Updated this week
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆96Updated last month
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆147Updated this week
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆123Updated last month
- Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".☆132Updated last week
- Esoteric Language Models☆99Updated last month
- working implimention of deepseek MLA☆44Updated 8 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆185Updated 3 months ago
- ☆196Updated 9 months ago
- Efficient LLM Inference over Long Sequences☆391Updated 2 months ago
- open source alpha evolve☆67Updated 3 months ago
- GRadient-INformed MoE☆264Updated 11 months ago
- RWKV-7: Surpassing GPT☆95Updated 10 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆97Updated 3 months ago
- Train, tune, and infer Bamba model☆132Updated 3 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆105Updated 6 months ago
- LLM Inference on consumer devices☆124Updated 6 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆88Updated 3 months ago
- Normalized Transformer (nGPT)☆188Updated 9 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆245Updated 7 months ago