satabios / sconceLinks
E2E AutoML Model Compression Package
☆46Updated 6 months ago
Alternatives and similar repositories for sconce
Users that are interested in sconce are comparing it to the libraries listed below
Sorting:
- Collection of autoregressive model implementation☆86Updated 4 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 5 months ago
- ☆46Updated last year
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆96Updated last month
- ☆69Updated last year
- Samples of good AI generated CUDA kernels☆90Updated 3 months ago
- Work in progress.☆72Updated 2 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆72Updated 5 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆57Updated this week
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆68Updated 4 months ago
- RWKV-7: Surpassing GPT☆95Updated 10 months ago
- ☆88Updated last year
- ☆28Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆96Updated 3 months ago
- ☆94Updated 3 weeks ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆129Updated 9 months ago
- A really tiny autograd engine☆94Updated 3 months ago
- working implimention of deepseek MLA☆44Updated 8 months ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆19Updated last month
- 📄Small Batch Size Training for Language Models☆62Updated 3 weeks ago
- H-Net Dynamic Hierarchical Architecture☆79Updated last week
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆84Updated last week
- Experiment of using Tangent to autodiff triton☆81Updated last year
- 👷 Build compute kernels☆143Updated this week
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆162Updated last year
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated 2 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated 11 months ago
- QuIP quantization☆59Updated last year
- ☆89Updated last year