amjadmajid / BabyTorch
BabyTorch is a minimalist deep-learning framework with a similar API to PyTorch. This minimalist design encourages learners explore and understand the underlying algorithms and mechanics of deep learning processes. It is design such that when learners are ready to switch to PyTorch they only need to remove the word `baby`.
☆26Updated 11 months ago
Alternatives and similar repositories for BabyTorch:
Users that are interested in BabyTorch are comparing it to the libraries listed below
- Implementation of Diffusion Transformer (DiT) in JAX☆272Updated 10 months ago
- Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.…☆91Updated 3 months ago
- ☆216Updated 9 months ago
- ☆175Updated 4 months ago
- Custom triton kernels for training Karpathy's nanoGPT.☆18Updated 6 months ago
- ☆150Updated 8 months ago
- Efficient optimizers☆189Updated this week
- seqax = sequence modeling + JAX☆154Updated 2 weeks ago
- ☆102Updated this week
- Accelerated minigrid environments with JAX☆134Updated 8 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆89Updated 3 weeks ago
- Accelerated First Order Parallel Associative Scan☆181Updated 8 months ago
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆120Updated 8 months ago
- A set of Python scripts that makes your experience on TPU better☆51Updated 9 months ago
- 🧱 Modula software package☆188Updated 3 weeks ago
- Minimal but scalable implementation of large language models in JAX☆34Updated 5 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆178Updated this week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆105Updated 5 months ago
- supporting pytorch FSDP for optimizers☆80Updated 4 months ago
- Pytorch implementation of Evolutionary Policy Optimization, from Wang et al. of the Robotics Institute at Carnegie Mellon University☆57Updated this week
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆51Updated last year
- ☆27Updated 9 months ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆82Updated last year
- Cost aware hyperparameter tuning algorithm☆150Updated 9 months ago
- ☆78Updated 9 months ago
- LoRA for arbitrary JAX models and functions☆136Updated last year
- The AdEMAMix Optimizer: Better, Faster, Older.☆180Updated 7 months ago
- ☆94Updated 3 months ago
- ☆87Updated last year
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆81Updated last month