xjdr-alt / mla_blog_translationLinks
☆12Updated last year
Alternatives and similar repositories for mla_blog_translation
Users that are interested in mla_blog_translation are comparing it to the libraries listed below
Sorting:
- train with kittens!☆63Updated last year
 - PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆137Updated last month
 - an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆107Updated 7 months ago
 - NSA Triton Kernels written with GPT5 and Opus 4.1☆64Updated 2 months ago
 - SIMD quantization kernels☆91Updated last month
 - Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated 11 months ago
 - ☆50Updated last year
 - Experimental GPU language with meta-programming☆23Updated last year
 - Collection of autoregressive model implementation☆86Updated 6 months ago
 - PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 7 months ago
 - NanoGPT-speedrunning for the poor T4 enjoyers☆72Updated 6 months ago
 - look how they massacred my boy☆63Updated last year
 - smolLM with Entropix sampler on pytorch☆150Updated last year
 - DeMo: Decoupled Momentum Optimization☆195Updated 11 months ago
 - train entropix like a champ!☆20Updated last year
 - Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆64Updated last month
 - QuIP quantization☆59Updated last year
 - H-Net Dynamic Hierarchical Architecture☆80Updated last month
 - ☆28Updated last year
 - Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆50Updated 8 months ago
 - Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆18Updated 3 months ago
 - An introduction to LLM Sampling☆79Updated 10 months ago
 - RWKV-7: Surpassing GPT☆98Updated 11 months ago
 - inference code for mixtral-8x7b-32kseqlen☆102Updated last year
 - Simplex Random Feature attention, in PyTorch☆73Updated 2 years ago
 - Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆82Updated 2 years ago
 - A really tiny autograd engine☆96Updated 5 months ago
 - Long context evaluation for large language models☆224Updated 8 months ago
 - ☆91Updated last year
 - ☆21Updated 9 months ago