xjdr-alt / mla_blog_translationLinks
☆14Updated last year
Alternatives and similar repositories for mla_blog_translation
Users that are interested in mla_blog_translation are comparing it to the libraries listed below
Sorting:
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆105Updated 6 months ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆120Updated last week
- NSA Triton Kernels written with GPT5 and Opus 4.1☆65Updated last month
- look how they massacred my boy☆64Updated 11 months ago
- train entropix like a champ!☆20Updated 11 months ago
- DeMo: Decoupled Momentum Optimization☆190Updated 9 months ago
- train with kittens!☆62Updated 10 months ago
- Collection of autoregressive model implementation☆86Updated 4 months ago
- Modded vLLM to run pipeline parallelism over public networks☆39Updated 3 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆71Updated 4 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 5 months ago
- Cerule - A Tiny Mighty Vision Model☆68Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆55Updated 7 months ago
- Entropy Based Sampling and Parallel CoT Decoding☆17Updated 11 months ago
- An introduction to LLM Sampling☆79Updated 9 months ago
- RWKV-7: Surpassing GPT☆95Updated 10 months ago
- Long context evaluation for large language models☆221Updated 6 months ago
- NanoGPT (124M) quality in 2.67B tokens☆28Updated last week
- Modify Entropy Based Sampling to work with Mac Silicon via MLX☆49Updated 10 months ago
- SIMD quantization kernels☆87Updated last week
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated last year
- Simple Transformer in Jax☆139Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆129Updated 9 months ago
- A collection of lightweight interpretability scripts to understand how LLMs think☆40Updated this week
- Simplex Random Feature attention, in PyTorch☆74Updated last year
- ☆49Updated last year
- ☆39Updated last year
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆65Updated 10 months ago
- A really tiny autograd engine☆94Updated 3 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 10 months ago