An implementation of LazyLLM token pruning for LLaMa 2 model family.
☆13Jan 6, 2025Updated last year
Alternatives and similar repositories for Lazy-Llama
Users that are interested in Lazy-Llama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official Pytorch Implementation of Paper "DarwinLM: Evolutionary Structured Pruning of Large Language Models"☆20Feb 21, 2025Updated last year
- [ICML 2025] Official implementation of the paper "SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling". …☆21Nov 17, 2025Updated 4 months ago
- Create string diagrams with LaTeX!☆14Jan 3, 2025Updated last year
- 3D Telecommunications project utilizing Holoportation technology to provide live volumetric capture. Used in one case to increase the re…☆20Feb 20, 2026Updated last month
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Oct 9, 2025Updated 5 months ago
- GoldFinch and other hybrid transformer components☆45Jul 20, 2024Updated last year
- ☆14Feb 25, 2019Updated 7 years ago
- Enhancing Sentence Embedding with Generalized Pooling☆11Jul 26, 2018Updated 7 years ago
- ☆12Apr 17, 2025Updated 11 months ago
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆19Jun 11, 2025Updated 9 months ago
- A type programming language which compiles to and interops with type-level TypeScript☆22Sep 9, 2022Updated 3 years ago
- [ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluation☆62Mar 12, 2026Updated 2 weeks ago
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆42Mar 13, 2023Updated 3 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Apache Airflow on Oracle Cloud Infrastructure☆16Jan 23, 2024Updated 2 years ago
- Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planning☆36Aug 19, 2023Updated 2 years ago
- A prompt set of ChatGLM-6B☆15Jul 21, 2023Updated 2 years ago
- Implement some method of LLM KV Cache Sparsity☆41Jun 6, 2024Updated last year
- A multi-label classification plugin for AllenNLP.☆11Jan 13, 2023Updated 3 years ago
- The officalimplement of dLLM-Factory☆26Jul 12, 2025Updated 8 months ago
- Official implementation of "How Important is Importance Sampling for Deep Budgeted Training?"☆11Oct 18, 2022Updated 3 years ago
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆678Feb 24, 2026Updated last month
- Unofficial Implementation of Selective Attention Transformer☆21Oct 31, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆32Jun 5, 2025Updated 9 months ago
- ☆17Jul 9, 2025Updated 8 months ago
- ☆12Nov 11, 2019Updated 6 years ago
- [KDD'22] Learned Token Pruning for Transformers☆98Feb 27, 2023Updated 3 years ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆51Aug 24, 2025Updated 7 months ago
- React crossword component☆26May 24, 2025Updated 10 months ago
- ☆18Jun 3, 2024Updated last year
- Codes for the paper The emergence of clusters in self-attention dynamics.☆17Dec 18, 2023Updated 2 years ago
- Code for Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution (ACL2021)☆13Jun 2, 2021Updated 4 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- UnitEval is a benchmarking and evaluation tools for AutoDev Coder.☆13Jan 2, 2024Updated 2 years ago
- Official codebase for “In-Context Learning with Many Demonstration Examples”☆16Feb 13, 2023Updated 3 years ago
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆207Nov 30, 2025Updated 4 months ago
- ☆28Nov 28, 2024Updated last year
- Load markdown through remark with image resolving and some react-specific features.☆18Feb 6, 2026Updated last month
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆12Apr 18, 2025Updated 11 months ago
- ☆13Nov 12, 2021Updated 4 years ago