kotoba-tech / kotomamba
Mamba training library developed by kotoba technologies
☆68Updated last year
Alternatives and similar repositories for kotomamba:
Users that are interested in kotomamba are comparing it to the libraries listed below
- Checkpointable dataset utilities for foundation model training☆32Updated last year
- Example of using Epochraft to train HuggingFace transformers models with PyTorch FSDP☆12Updated last year
- ☆10Updated 9 months ago
- Ongoing Research Project for continaual pre-training LLM(dense mode)☆38Updated last week
- Support Continual pre-training & Instruction Tuning forked from llama-recipes☆31Updated last year
- Randomized Positional Encodings Boost Length Generalization of Transformers☆79Updated 11 months ago
- LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation☆21Updated 10 months ago
- Griffin MQA + Hawk Linear RNN Hybrid☆85Updated 10 months ago
- ☆94Updated 9 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- ☆41Updated 11 months ago
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆86Updated 7 months ago
- ☆20Updated last year
- Japanese LLaMa experiment☆52Updated 3 months ago
- Here we will test various linear attention designs.☆59Updated 10 months ago
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆84Updated this week
- Token Omission Via Attention☆124Updated 5 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Updated last year
- ☆22Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 5 months ago
- Implementation of the Mamba SSM with hf_integration.☆56Updated 6 months ago
- ☆47Updated last year
- Triton Implementation of HyperAttention Algorithm☆47Updated last year
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆55Updated 9 months ago
- ☆58Updated 9 months ago
- List of papers on Self-Correction of LLMs.☆71Updated 2 months ago
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆25Updated last year
- Code repository for the c-BTM paper☆106Updated last year
- Unofficial Implementation of Evolutionary Model Merging☆35Updated 11 months ago