alexisrozhkov / dilated-self-attention
Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
☆13Updated last year
Related projects ⓘ
Alternatives and complementary repositories for dilated-self-attention
- (Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307…☆51Updated last year
- Multipack distributed sampler for fast padding-free training of LLMs☆178Updated 3 months ago
- Implementation of mamba with rust☆73Updated 8 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆59Updated 7 months ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆89Updated last year
- Recurrent Memory Transformer☆147Updated last year
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆194Updated this week
- ☆176Updated this week
- Code for Zero-Shot Tokenizer Transfer☆117Updated last month
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆196Updated 7 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆113Updated 5 months ago
- minimal pytorch implementation of bm25 (with sparse tensors)☆90Updated 8 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆130Updated 2 months ago
- Prune transformer layers☆65Updated 5 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆84Updated this week
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆71Updated last month
- Truly flash T5 realization!☆54Updated 6 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆183Updated last month
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆64Updated last month
- some common Huggingface transformers in maximal update parametrization (µP)☆77Updated 2 years ago
- X-LoRA: Mixture of LoRA Experts☆178Updated 3 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆93Updated last month
- ☆77Updated 5 months ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆74Updated last month
- BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.☆157Updated this week
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆97Updated last year
- ☆161Updated last year
- experiments with inference on llama☆105Updated 5 months ago
- Understand and test language model architectures on synthetic tasks.☆163Updated 6 months ago
- ☆35Updated 3 weeks ago