OptimalFoundation / nadirLinks
Nadir: Cutting-edge PyTorch optimizers for simplicity & composability! ๐ฅ๐๐ป
โ14Updated last year
Alternatives and similar repositories for nadir
Users that are interested in nadir are comparing it to the libraries listed below
Sorting:
- Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)โ61Updated 2 years ago
- Amos optimizer with JEstimator lib.โ82Updated last year
- Large scale 4D parallelism pre-training for ๐ค transformers in Mixture of Experts *(still work in progress)*โ86Updated 2 years ago
- minimal pytorch implementation of bm25 (with sparse tensors)โ104Updated 3 months ago
- โ34Updated 2 years ago
- A library to create and manage configuration files, especially for machine learning projects.โ79Updated 3 years ago
- Experiments with generating opensource language model assistantsโ97Updated 2 years ago
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)โ190Updated 3 years ago
- Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale, TACL (2022)โ134Updated this week
- Exploring finetuning public checkpoints on filter 8K sequences on Pileโ116Updated 2 years ago
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.โ53Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.โ96Updated 3 years ago
- Experiments for efforts to train a new and improved t5โ76Updated last year
- โ66Updated 3 years ago
- Like picoGPT but for BERT.โ51Updated 2 years ago
- A case study of efficient training of large language models using commodity hardware.โ68Updated 3 years ago
- ML/DL Math and Method notesโ66Updated 2 years ago
- An instruction-based benchmark for text improvements.โ142Updated 3 years ago
- XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scaleโ157Updated 2 years ago
- This is the code that went into our practical dive using mamba as information extractionโ57Updated 2 years ago
- A diff tool for language modelsโ44Updated 2 years ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Dayโ260Updated 2 years ago
- A library for squeakily cleaning and filtering language datasets.โ49Updated 2 years ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023โ137Updated last year
- Code repository for the c-BTM paperโ108Updated 2 years ago
- Project 2 (Building Large Language Models) for Stanford CS324: Understanding and Developing Large Language Models (Winter 2022)โ105Updated 2 years ago
- My explorations into editing the knowledge and memories of an attention networkโ35Updated 3 years ago
- โ167Updated 2 years ago
- some common Huggingface transformers in maximal update parametrization (ยตP)โ87Updated 3 years ago
- Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 qโฆโ89Updated last year