dust-tt / llama-sspView external linksLinks
Experiments on speculative sampling with Llama models
☆128Jun 8, 2023Updated 2 years ago
Alternatives and similar repositories for llama-ssp
Users that are interested in llama-ssp are comparing it to the libraries listed below
Sorting:
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆15Mar 11, 2024Updated last year
- ☆16Jul 23, 2024Updated last year
- Fast inference from large lauguage models via speculative decoding☆886Aug 22, 2024Updated last year
- Neural Network Execution Service☆11Oct 3, 2023Updated 2 years ago
- ☆595Aug 23, 2024Updated last year
- ☆553Feb 8, 2026Updated last week
- ☆23Mar 31, 2023Updated 2 years ago
- Codebase for Context-aware Meta-learned Loss Scaling (CaMeLS). https://arxiv.org/abs/2305.15076.☆25Jan 23, 2024Updated 2 years ago
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Nov 11, 2024Updated last year
- triton ver of gqa flash attn, based on the tutorial☆12Aug 4, 2024Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆279Nov 3, 2023Updated 2 years ago
- Collection of ChatGPT alternatives & LLM tuning methods☆12Mar 31, 2023Updated 2 years ago
- Go language bindings for the ggwave C++ library☆14Apr 9, 2025Updated 10 months ago
- Yet Another (LLM) Web UI, made with Gemini☆12Dec 25, 2024Updated last year
- Multi-Figurative Language Generation (COLING 2022)☆12Jan 30, 2023Updated 3 years ago
- Personalized Fashion Compatibility Modeling via Metapath-guided Heterogeneous Graph Learning.☆15Nov 7, 2022Updated 3 years ago
- code for training and using chess embeddings models☆13Jun 9, 2024Updated last year
- visualize your python data as it comes in, with minimal intrusion☆13Nov 14, 2017Updated 8 years ago
- ☆19Mar 31, 2024Updated last year
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…☆1,586Jan 28, 2026Updated 2 weeks ago
- QLoRA: Efficient Finetuning of Quantized LLMs☆79Apr 10, 2024Updated last year
- Evaluating LLMs with fewer examples☆169Apr 12, 2024Updated last year
- ☆31Nov 18, 2025Updated 2 months ago
- ☆12Jun 27, 2024Updated last year
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).☆2,182Jan 27, 2026Updated 2 weeks ago
- Companion Repo for the book The Applied ML Field Manual, Prithiviraj Damodaran☆12Jun 22, 2022Updated 3 years ago
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- ☆14Dec 26, 2022Updated 3 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,436Jul 17, 2025Updated 6 months ago
- 4 bits quantization of LLaMA using GPTQ☆3,073Jul 13, 2024Updated last year
- experiments with inference on llama☆103Jun 6, 2024Updated last year
- ☆17Feb 16, 2024Updated 2 years ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆31Jun 5, 2025Updated 8 months ago
- WebAssembly sourcemap generator and WASM binary source mapping url section patcher☆45Nov 24, 2025Updated 2 months ago
- batched loras☆349Sep 6, 2023Updated 2 years ago
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆214Sep 11, 2025Updated 5 months ago
- ☆457Oct 15, 2023Updated 2 years ago
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads☆2,705Jun 25, 2024Updated last year
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆75Aug 2, 2024Updated last year