lucidrains / MEGABYTE-pytorchLinks

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

☆653

Alternatives and similar repositories for MEGABYTE-pytorch

Users that are interested in MEGABYTE-pytorch are comparing it to the libraries listed below

Sorting:

lucidrains / recurrent-memory-transformer-pytorch
Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch
☆417Updated 9 months ago
HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆560Updated 9 months ago
OpenLMLab / LOMO
LOMO: LOw-Memory Optimization
☆988Updated last year
Guitaricet / relora
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
☆465Updated last year
Liuhong99 / Sophia
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
☆976Updated last year
ofirpress / attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
☆544Updated last year
princeton-nlp / MeZO
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
☆1,131Updated last year
kyegomez / LongNet
Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
☆711Updated last year
OpenNLPLab / TransnormerLLM
Official implementation of TransNormerLLM: A Faster and Better LLM
☆247Updated last year
arielnlee / Platypus
Code for fine-tuning Platypus fam LLMs using LoRA
☆628Updated last year
changjonathanc / minLoRA
minLoRA: a minimal PyTorch library that allows you to apply LoRA to any PyTorch model.
☆479Updated 2 years ago
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Updated last year
sail-sg / lorahub
[COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
☆655Updated last year
abertsch72 / unlimiformer
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
☆1,062Updated last year
lucidrains / simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
☆222Updated last year
syncdoth / RetNet
Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent,…
☆226Updated last year
Vahe1994 / SpQR
☆546Updated 10 months ago
HazyResearch / safari
Convolutions for Sequence Modeling
☆899Updated last year
ethanyanjiali / minChatGPT
A minimum example of aligning language models with RLHF similar to ChatGPT
☆223Updated 2 years ago
epfml / landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
☆426Updated last year
bojone / rerope
Rectified Rotary Position Embeddings
☆380Updated last year
zphang / minimal-llama
☆457Updated 2 years ago
dingo-actual / infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…
☆291Updated last year
neelsjain / NEFTune
Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning
☆401Updated last year
BlackSamorez / tensor_parallel
Automatically split your PyTorch models on multiple GPUs for training & inference
☆657Updated last year
kyegomez / Sophia
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
☆382Updated last year
tomaarsen / attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
☆722Updated last year
haoliuhl / ringattention
Large Context Attention
☆743Updated last week
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆542Updated 5 months ago
princeton-nlp / LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
☆631Updated last year