gensyn-ai / nolocoLinks
Experimental repository for research implementation of NoLoCo.
☆29Updated 6 months ago
Alternatives and similar repositories for noloco
Users that are interested in noloco are comparing it to the libraries listed below
Sorting:
- The evaluation framework for training-free sparse attention in LLMs☆108Updated 2 months ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆21Updated 3 weeks ago
- Work in progress.☆76Updated last month
- ☆114Updated last week
- [WIP] Better (FP8) attention for Hopper☆32Updated 10 months ago
- ☆47Updated 8 months ago
- ☆23Updated 8 months ago
- ☆66Updated 9 months ago
- Make triton easier☆50Updated last year
- DPO, but faster 🚀☆46Updated last year
- ☆133Updated 7 months ago
- vLLM adapter for a TGIS-compatible gRPC server.☆47Updated this week
- ☆44Updated 7 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆41Updated 2 weeks ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆131Updated last year
- mHC kernels implemented in CUDA☆196Updated last week
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆128Updated 6 months ago
- ☆27Updated 9 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Updated 6 months ago
- Pipeline parallelism for the minimalist☆37Updated 5 months ago
- Block Diffusion for Ultra-Fast Speculative Decoding☆188Updated this week
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆141Updated 4 months ago
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Updated last year
- Fork of Flame repo for training of some new stuff in development☆19Updated this week
- FlexAttention w/ FlashAttention3 Support☆27Updated last year
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆110Updated 2 months ago
- Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning☆57Updated 3 weeks ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆125Updated 3 months ago
- some mixture of experts architecture implementations☆25Updated last year