jondurbin / bagel
A bagel, with everything.
☆318Updated last year
Alternatives and similar repositories for bagel:
Users that are interested in bagel are comparing it to the libraries listed below
- Multipack distributed sampler for fast padding-free training of LLMs☆186Updated 8 months ago
- ☆508Updated 4 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆233Updated 10 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆254Updated 9 months ago
- ☆524Updated 7 months ago
- batched loras☆341Updated last year
- Merge Transformers language models by use of gradient parameters.☆206Updated 8 months ago
- Pre-training code for Amber 7B LLM☆166Updated 11 months ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers☆423Updated last year
- Official PyTorch implementation of QA-LoRA☆131Updated last year
- Generate textbook-quality synthetic LLM pretraining data☆498Updated last year
- Low-Rank adapter extraction for fine-tuned transformers models☆171Updated 11 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆300Updated last year
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆457Updated last year
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆210Updated 5 months ago
- Official repository for LongChat and LongEval☆517Updated 10 months ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆712Updated 6 months ago
- Inference code for Persimmon-8B☆415Updated last year
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.☆468Updated 7 months ago
- Inference code for Mistral and Mixtral hacked up into original Llama implementation☆371Updated last year
- ☆412Updated last year
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆120Updated last year
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆648Updated 10 months ago
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆693Updated last year
- Fast & more realistic evaluation of chat language models. Includes leaderboard.☆186Updated last year
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆217Updated last year
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆229Updated 11 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆205Updated 10 months ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆231Updated 5 months ago
- ☆268Updated last year