melisa-writer / short-transformersView external linksLinks
Prune transformer layers
☆74May 30, 2024Updated last year
Alternatives and similar repositories for short-transformers
Users that are interested in short-transformers are comparing it to the libraries listed below
Sorting:
- [ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆39Feb 4, 2025Updated last year
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆15Jul 18, 2024Updated last year
- Official implementation for LaCo (EMNLP 2024 Findings)☆21Oct 3, 2024Updated last year
- A hackable library for running and fine-tuning modern transformer models on commodity and alternative GPUs, powered by tinygrad.☆27Nov 27, 2025Updated 2 months ago
- ☆30Jul 22, 2024Updated last year
- Collection of autoregressive model implementation☆85Updated this week
- Implementation code for ACL2024:Advancing Parameter Efficiency in Fine-tuning via Representation Editing☆15Apr 20, 2024Updated last year
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆261Apr 23, 2024Updated last year
- [NeurIPS '25] Multi-Token Prediction Needs Registers☆26Dec 14, 2025Updated 2 months ago
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆77Apr 29, 2024Updated last year
- ☆20Oct 13, 2024Updated last year
- Web application for visualizing robotics datasets in LeRobot format☆45Updated this week
- [ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".☆23Mar 16, 2025Updated 10 months ago
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 7 months ago
- ☆25Oct 31, 2024Updated last year
- ☆10Oct 17, 2023Updated 2 years ago
- [CVPR '24] Official implementation of the paper "Multiflow: Shifting Towards Task-Agnostic Vision-Language Pruning".☆23Mar 7, 2025Updated 11 months ago
- D^2-MoE: Delta Decompression for MoE-based LLMs Compression☆72Mar 25, 2025Updated 10 months ago
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.☆106Jul 1, 2024Updated last year
- [NAACL 2025 Main Selected Oral] Repository for the paper: Prompt Compression for Large Language Models: A Survey☆36May 18, 2025Updated 8 months ago
- ☆31Nov 11, 2024Updated last year
- Official implementation of the transformer (TF) architecture suggested in a paper entitled "Looped Transformers as Programmable Computers…☆30Apr 8, 2023Updated 2 years ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31May 22, 2024Updated last year
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆67Mar 27, 2025Updated 10 months ago
- InSales e-commerce platform API bindings☆14Jul 13, 2024Updated last year
- Convert Confluence MIME exports (.doc) to clean Markdown☆30Jan 13, 2026Updated last month
- ☆82Nov 11, 2024Updated last year
- ☆40Nov 22, 2025Updated 2 months ago
- ☆40Mar 28, 2024Updated last year
- [NeurIPS 2022 Spotlight] This is the official PyTorch implementation of "EcoFormer: Energy-Saving Attention with Linear Complexity"☆73Nov 15, 2022Updated 3 years ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated 11 months ago
- RL with Experience Replay☆55Jul 27, 2025Updated 6 months ago
- A blueprint for next-gen AI. Project Infinity uses a token-efficient, Codified Agent Protocol to create specialized, secure, and imaginat…☆25Oct 2, 2025Updated 4 months ago
- ☆17Feb 6, 2025Updated last year
- SimADFuzz: Simulation-Feedback Fuzz Testing for Autonomous Driving Systems☆10Apr 11, 2025Updated 10 months ago
- ☆37Dec 19, 2024Updated last year
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆144Sep 10, 2023Updated 2 years ago
- For releasing code related to compression methods for transformers, accompanying our publications☆455Jan 16, 2025Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated last year