kotoba-tech / kotomambaLinks

Mamba training library developed by kotoba technologies

☆70

Alternatives and similar repositories for kotomamba

Users that are interested in kotomamba are comparing it to the libraries listed below

Sorting:

iwiwi / epochraft
Checkpointable dataset utilities for foundation model training
☆32Updated last year
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
iwiwi / epochraft-hf-fsdp
Example of using Epochraft to train HuggingFace transformers models with PyTorch FSDP
☆11Updated last year
SakanaAI / TAID
Official implementation of "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models"
☆118Updated last month
okoge-kaz / llm-recipes
Ongoing Research Project for continaual pre-training LLM(dense mode)
☆42Updated 8 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆101Updated last year
lighttransport / japanese-llama-experiment
Japanese LLaMa experiment
☆54Updated last month
Aratako / Task-Vector-Merge-Optimzier
☆15Updated last year
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 5 months ago
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
google-deepmind / randomized_positional_encodings
Randomized Positional Encodings Boost Length Generalization of Transformers
☆83Updated last year
SakanaAI / CycleQD
CycleQD is a framework for parameter space model merging.
☆44Updated 9 months ago
warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆116Updated 2 months ago
kotoba-tech / kotoba-recipes
Support Continual pre-training & Instruction Tuning forked from llama-recipes
☆33Updated last year
llm-jp / llm-jp-sft
☆61Updated last year
okoge-kaz / moe-recipes
Ongoing research training Mixture of Expert models.
☆21Updated last year
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆66Updated last year
leia-llm / leia
LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation
☆22Updated last year
AUGMXNT / shisa
☆41Updated last year
insuhan / hyper-attn
☆83Updated last year
Ino-Ichan / GIT-LLM
☆22Updated 2 years ago
kyegomez / MambaByte
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
☆125Updated 3 weeks ago
Beomi / BitNet-Transformers
0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" i…
☆309Updated last year
turingmotors / vlm-recipes
☆20Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated 3 weeks ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆53Updated 2 years ago
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆229Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
booydar / LM-RMT
Recurrent Memory Transformer
☆154Updated 2 years ago