apple / ml-sigma-reparam
☆301Updated 8 months ago
Alternatives and similar repositories for ml-sigma-reparam:
Users that are interested in ml-sigma-reparam are comparing it to the libraries listed below
- Efficient optimizers☆181Updated this week
- WIP☆93Updated 7 months ago
- Scalable and Performant Data Loading☆224Updated this week
- For optimization algorithm research and development.☆498Updated 2 weeks ago
- supporting pytorch FSDP for optimizers☆79Updated 3 months ago
- Understand and test language model architectures on synthetic tasks.☆183Updated last week
- JAX implementation of the Llama 2 model☆216Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆122Updated 10 months ago
- ☆75Updated 8 months ago
- Annotated version of the Mamba paper☆474Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆223Updated 3 weeks ago
- Named tensors with first-class dimensions for PyTorch☆321Updated last year
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆156Updated 11 months ago
- ☆164Updated 3 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆213Updated 2 weeks ago
- A Jax-based library for designing and training transformer models from scratch.☆282Updated 6 months ago
- Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch☆226Updated 6 months ago
- ☆78Updated 10 months ago
- σ-GPT: A New Approach to Autoregressive Models☆61Updated 6 months ago
- ☆53Updated last year
- Helpful tools and examples for working with flex-attention☆679Updated this week
- 🧱 Modula software package☆169Updated this week
- ☆194Updated this week
- Universal Tensor Operations in Einstein-Inspired Notation for Python.☆359Updated last month
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆120Updated 7 months ago
- Fast bare-bones BPE for modern tokenizer training☆149Updated 4 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆506Updated 4 months ago
- ☆212Updated 7 months ago
- Puzzles for exploring transformers☆334Updated last year