kotoba-tech / kotomamba
Mamba training library developed by kotoba technologies
☆67Updated last year
Alternatives and similar repositories for kotomamba:
Users that are interested in kotomamba are comparing it to the libraries listed below
- Checkpointable dataset utilities for foundation model training☆32Updated last year
- ☆10Updated 9 months ago
- Example of using Epochraft to train HuggingFace transformers models with PyTorch FSDP☆12Updated last year
- Unofficial Implementation of Evolutionary Model Merging☆33Updated 10 months ago
- Support Continual pre-training & Instruction Tuning forked from llama-recipes☆31Updated last year
- Griffin MQA + Hawk Linear RNN Hybrid☆85Updated 9 months ago
- Ongoing Research Project for continaual pre-training LLM(dense mode)☆37Updated last month
- LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation☆21Updated 9 months ago
- Plug in & Play Pytorch Implementation of the paper: "Evolutionary Optimization of Model Merging Recipes" by Sakana AI☆28Updated 3 months ago
- ☆91Updated 8 months ago
- ☆41Updated 10 months ago
- ☆58Updated 8 months ago
- Japanese LLaMa experiment☆52Updated 2 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Updated last year
- ☆72Updated 9 months ago
- ☆26Updated 9 months ago
- Implementation of Infini-Transformer in Pytorch☆109Updated last month
- ☆14Updated 10 months ago
- ☆22Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆96Updated 4 months ago
- CycleQD is a framework for parameter space model merging.☆31Updated 3 weeks ago
- A toolkit for scaling law research ⚖☆47Updated 3 weeks ago
- The robust text processing pipeline framework enabling customizable, efficient, and metric-logged text preprocessing.☆120Updated 3 months ago
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆81Updated 3 weeks ago
- Randomized Positional Encodings Boost Length Generalization of Transformers☆79Updated 11 months ago
- Code repository for the c-BTM paper☆105Updated last year
- ☆138Updated last year
- Code for pre-training BabyLM baseline models.☆12Updated last year
- Here we will test various linear attention designs.☆58Updated 9 months ago
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆82Updated 7 months ago