NolanoOrg / SpectraSuiteLinks

☆51

Alternatives and similar repositories for SpectraSuite

Users that are interested in SpectraSuite are comparing it to the libraries listed below

Sorting:

IST-DASLab / QuEST
Work in progress.
☆74Updated 3 months ago
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆36Updated 2 weeks ago
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆156Updated last year
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆98Updated 11 months ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆59Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 10 months ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆92Updated 5 months ago
IST-DASLab / Quartet
☆102Updated this week
FasterDecoding / BitDelta
☆202Updated 10 months ago
Cornell-RelaxML / yaqa-quantization
☆60Updated 4 months ago
IST-DASLab / gptq-gguf-toolkit
Efficient non-uniform quantization with GPTQ for GGUF
☆52Updated last month
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated 2 years ago
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆134Updated 4 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆99Updated last year
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆102Updated 2 weeks ago
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆59Updated last year
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆155Updated 6 months ago
GreenBitAI / low_bit_llama
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
☆110Updated last year
kyleliang919 / Super_Muon
☆64Updated 7 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 9 months ago
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆110Updated 6 months ago
LiqunMa / FBI-LLM
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
☆51Updated 2 months ago
catid / lllm
Latent Large Language Models
☆19Updated last year
HanGuo97 / lq-lora
☆127Updated last year
arcee-ai / DAM
☆55Updated 11 months ago
Anonymous1252022 / fp4-all-the-way
☆35Updated 5 months ago
VITA-Group / WeLore
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…
☆51Updated 6 months ago
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year