dust-tt / llama-sspLinks

Experiments on speculative sampling with Llama models

☆125

Alternatives and similar repositories for llama-ssp

Users that are interested in llama-ssp are comparing it to the libraries listed below

Sorting:

imoneoi / multipack
Multipack distributed sampler for fast padding-free training of LLMs
☆201Updated last year
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆204Updated last year
FasterDecoding / BitDelta
☆202Updated 10 months ago
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆155Updated 6 months ago
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year
IBM / ModuleFormer
ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…
☆224Updated last month
Digitous / LLM-SLERP-Merge
Spherical Merge Pytorch/HF format Language Models with minimal feature loss.
☆138Updated 2 years ago
jeffreysijuntan / lloco
The official repo for "LLoCo: Learning Long Contexts Offline"
☆117Updated last year
huggingface / llm-swarm
Manage scalable open LLM inference endpoints in Slurm clusters
☆273Updated last year
HanGuo97 / lq-lora
☆127Updated last year
kernelmachine / cbtm
Code repository for the c-BTM paper
☆107Updated 2 years ago
AnswerDotAI / cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…
☆147Updated last year
LLM360 / amber-train
Pre-training code for Amber 7B LLM
☆168Updated last year
Infini-AI-Lab / Sequoia
scalable and robust tree-based speculative decoding algorithm
☆359Updated 8 months ago
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
lm-sys / llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
☆311Updated last year
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆248Updated 8 months ago
golololologol / LLM-Distillery
A pipeline for LLM knowledge distillation
☆109Updated 6 months ago
LLM360 / amber-data-prep
Data preparation code for Amber 7B LLM
☆92Updated last year
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆156Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 10 months ago
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated last year
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆202Updated last year
sabetAI / BLoRA
batched loras
☆346Updated 2 years ago
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆241Updated 4 months ago
booydar / babilong
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
☆215Updated last month
GAIR-NLP / Entropy-ABF
Official implementation for 'Extending LLMs’ Context Window with 100 Samples'
☆80Updated last year
uukuguy / multi_loras
Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…
☆158Updated last year
sdan / selfextend
an implementation of Self-Extend, to expand the context window via grouped attention
☆118Updated last year
magicproduct / hash-hop
Long context evaluation for large language models
☆223Updated 7 months ago