An implementation of LazyLLM token pruning for LLaMa 2 model family.
☆13Jan 6, 2025Updated last year
Alternatives and similar repositories for Lazy-Llama
Users that are interested in Lazy-Llama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12Oct 17, 2023Updated 2 years ago
- ☆12Aug 22, 2023Updated 2 years ago
- Official Pytorch Implementation of Paper "DarwinLM: Evolutionary Structured Pruning of Large Language Models"☆20Feb 21, 2025Updated last year
- [ICML 2025] Official implementation of the paper "SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling". …☆22Nov 17, 2025Updated 5 months ago
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024☆23Jun 26, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- PCA-SVD-Autoencoder-Fourier-Wavelet-Transformation-for-denoising☆22Feb 16, 2022Updated 4 years ago
- Create string diagrams with LaTeX!☆14Jan 3, 2025Updated last year
- 3D Telecommunications project utilizing Holoportation technology to provide live volumetric capture. Used in one case to increase the re…☆21Updated this week
- MULTITQ is a large-scale dataset featuring ample relevant facts and multiple temporal granularities.☆25Mar 11, 2024Updated 2 years ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆37Oct 9, 2025Updated 6 months ago
- The Compositionality article class.☆13Mar 16, 2026Updated last month
- GoldFinch and other hybrid transformer components☆46Jul 20, 2024Updated last year
- Repository for paper Decrypting Cryptic Crosswords☆10Jan 15, 2022Updated 4 years ago
- Enhancing Sentence Embedding with Generalized Pooling☆11Jul 26, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆12Apr 17, 2025Updated last year
- Code for experiments on transformers using Markovian data.☆22Nov 22, 2024Updated last year
- Experiments learning the even-parity dataset with MPS (tensor trains)☆24Nov 1, 2023Updated 2 years ago
- A type programming language which compiles to and interops with type-level TypeScript☆22Sep 9, 2022Updated 3 years ago
- milon27.com portfolio application with NEXT JS and contentful☆25Feb 25, 2022Updated 4 years ago
- Apache Airflow on Oracle Cloud Infrastructure☆16Jan 23, 2024Updated 2 years ago
- Implement some method of LLM KV Cache Sparsity☆40Jun 6, 2024Updated last year
- A multi-label classification plugin for AllenNLP.☆11Jan 13, 2023Updated 3 years ago
- The officalimplement of dLLM-Factory☆25Jul 12, 2025Updated 9 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Official implementation of "How Important is Importance Sampling for Deep Budgeted Training?"☆11Oct 18, 2022Updated 3 years ago
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆689Updated this week
- ☆35Jun 5, 2025Updated 10 months ago
- ☆17Jul 9, 2025Updated 9 months ago
- ☆12Nov 11, 2019Updated 6 years ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆51Aug 24, 2025Updated 7 months ago
- ☆18Jun 3, 2024Updated last year
- Codes for the paper The emergence of clusters in self-attention dynamics.☆17Dec 18, 2023Updated 2 years ago
- Code base for ICLR 2025 "Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection"☆58Sep 5, 2025Updated 7 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The code for Joint Neural Architecture Search and Quantization☆14Apr 10, 2019Updated 7 years ago
- UnitEval is a benchmarking and evaluation tools for AutoDev Coder.☆14Jan 2, 2024Updated 2 years ago
- ☆10Jun 19, 2024Updated last year
- Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM☆14Dec 27, 2023Updated 2 years ago
- C++ framework for deep learning☆13Dec 1, 2022Updated 3 years ago
- Conjure generator for TypeScript clients☆21Mar 31, 2026Updated 2 weeks ago
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆214Nov 30, 2025Updated 4 months ago