kaiokendev / cutoff-len-is-context-lenLinks

Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit

☆63

Alternatives and similar repositories for cutoff-len-is-context-len

Users that are interested in cutoff-len-is-context-len are comparing it to the libraries listed below

Sorting:

Zyphra / Zyda_processing
☆39Updated last year
euclaise / supertrainer2000
☆50Updated last year
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year
kernelmachine / cbtm
Code repository for the c-BTM paper
☆108Updated 2 years ago
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Updated 2 years ago
huu4ontocord / MDEL
Multi-Domain Expert Learning
☆67Updated last year
ChrisHayduk / qlora-multi-gpu
QLoRA with Enhanced Multi GPU Support
☆37Updated 2 years ago
kyegomez / LM-Infinite
Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆40Updated last year
official-elinas / zeus-llm-trainer
Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models
☆70Updated 2 years ago
scottlogic-alex / prm800k-denorm
Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format
☆27Updated 2 years ago
CERC-AAI / Robin
☆63Updated last year
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 5 months ago
argilla-io / distilabel-spin-dibt
Repository containing the SPIN experiments on the DIBT 10k ranked prompts
☆24Updated last year
Birch-san / booru-embed
[WIP] Transformer to embed Danbooru labelsets
☆13Updated last year
EleutherAI / improved-t5
Experiments for efforts to train a new and improved t5
☆76Updated last year
CarperAI / squeakily
A library for squeakily cleaning and filtering language datasets.
☆49Updated 2 years ago
joey00072 / ohara
Collection of autoregressive model implementation
☆85Updated 7 months ago
ElleLeonne / Lightning-ReLoRA
A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.
☆35Updated last year
cloneofsimo / fim-llama-deepspeed
☆32Updated last year
cat-state / tinypar
☆20Updated 2 years ago
huggingface / peft-pytorch-conference
Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…
☆14Updated 2 years ago
NolanoOrg / sparse_quant_llms
SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia
☆41Updated 2 years ago
r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆45Updated 2 months ago
LegallyCoder / mamba-hf
Implementation of the Mamba SSM with hf_integration.
☆56Updated last year
Rallio67 / language-model-agents
Experiments with generating opensource language model assistants
☆97Updated 2 years ago
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated last year
EleutherAI / semantic-memorization
☆44Updated last year
McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆138Updated last year
akjindal53244 / Arithmo
Small and Efficient Mathematical Reasoning LLMs
☆72Updated last year