tomaarsen / attention_sinksLinks

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining

☆732

Alternatives and similar repositories for attention_sinks

Users that are interested in attention_sinks are comparing it to the libraries listed below

Sorting:

arielnlee / Platypus
Code for fine-tuning Platypus fam LLMs using LoRA
☆630Updated last year
datamllab / LongLM
[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
☆662Updated last year
sabetAI / BLoRA
batched loras
☆347Updated 2 years ago
apoorvumang / prompt-lookup-decoding
☆580Updated last year
epfml / landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
☆427Updated last year
abacusai / Long-Context
This repository contains code and tooling for the Abacus.AI LLM Context Expansion project. Also included are evaluation scripts and bench…
☆598Updated 2 years ago
huggingface / cosmopedia
☆556Updated last year
jondurbin / bagel
A bagel, with everything.
☆325Updated last year
jquesnelle / yarn
YaRN: Efficient Context Window Extension of Large Language Models
☆1,643Updated last year
kuleshov-group / llmtools
Finetuning Large Language Models on One Consumer GPU in 2 Bits
☆733Updated last year
nomic-ai / contrastors
Train Models Contrastively in Pytorch
☆754Updated 8 months ago
SkunkworksAI / hydra-moe
☆415Updated 2 years ago
Vahe1994 / SpQR
☆550Updated 11 months ago
DachengLi1 / LongChat
Official repository for LongChat and LongEval
☆532Updated last year
yuchenlin / LLM-Blender
[ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the dive…
☆970Updated last year
HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆561Updated 11 months ago
persimmon-ai-labs / adept-inference
Inference code for Persimmon-8B
☆412Updated 2 years ago
xfactlab / orpo
Official repository for ORPO
☆467Updated last year
VikParuchuri / textbook_quality
Generate textbook-quality synthetic LLM pretraining data
☆507Updated 2 years ago
princeton-nlp / LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
☆632Updated last year
declare-lab / instruct-eval
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
☆552Updated last year
ContextualAI / gritlm
Generative Representational Instruction Tuning
☆679Updated 5 months ago
yuhuixu1993 / qa-lora
Official PyTorch implementation of QA-LoRA
☆145Updated last year
pbelcak / UltraFastBERT
The repository for the code of the UltraFastBERT paper
☆520Updated last year
pratyushasharma / laser
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
☆389Updated last year
SqueezeAILab / SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆708Updated last year
jondurbin / airoboros
Customizable implementation of the self-instruct paper.
☆1,050Updated last year
Guitaricet / relora
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
☆469Updated last year
rmihaylov / falcontune
Tune any FALCON in 4-bit
☆465Updated 2 years ago
jzhang38 / EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
☆750Updated last year