A family of efficient edge language models in 100M~1B sizes.
☆19Feb 14, 2025Updated last year
Alternatives and similar repositories for EfficientLLM
Users that are interested in EfficientLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models"☆32Mar 26, 2026Updated last month
- [NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer☆30Dec 6, 2023Updated 2 years ago
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆187Jan 1, 2025Updated last year
- [ICLR 2026] Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding☆32Jan 27, 2026Updated 3 months ago
- The official implementations of Noise-Informed Diffusion-Generated Image Detection With Anomaly Attention (TIFS 2025)☆17Jun 23, 2025Updated 10 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆13Sep 25, 2023Updated 2 years ago
- ☆13Oct 13, 2025Updated 6 months ago
- Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".☆19Oct 30, 2024Updated last year
- ☆18Mar 23, 2022Updated 4 years ago
- Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores (EuroSys'25)☆15Jul 17, 2025Updated 9 months ago
- Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries☆41Nov 19, 2025Updated 5 months ago
- Official PyTorch implementation of our paper "Dispersing Prompt Expansion for Class-Agnostic Object Detection" (NeurIPS 2024)☆14Jan 19, 2025Updated last year
- Official PyTorch implementation of "Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming" (ICML'23)☆13Apr 13, 2026Updated 3 weeks ago
- Official PyTorch implementation of QwT—“Quantization without Tears” (CVPR 2025): fast, accurate, and hassle-free post-training network qu…☆33Sep 30, 2025Updated 7 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [ICLR 2025] Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron☆30Apr 30, 2025Updated last year
- [NeurIPS 2025 Spotlight] A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone.☆47Oct 29, 2025Updated 6 months ago
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆18Jul 21, 2024Updated last year
- Efficient 2:4 sparse training algorithms and implementations☆61Dec 8, 2024Updated last year
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models☆75Jan 6, 2024Updated 2 years ago
- [ICLR 2026] ParallelBench: Understanding the Tradeoffs of Parallel Decoding in Diffusion LLMs☆45Mar 27, 2026Updated last month
- This is official implementation of "Curriculum Fine-tuning of Vision Foundation Model for Medical Image Classification Under Label Noise"…☆21Mar 18, 2025Updated last year
- Code repo for FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMs.☆33Nov 5, 2025Updated 6 months ago
- Python implementation of the Huffman Code compression algorithm.☆14Apr 18, 2013Updated 13 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆18Jun 6, 2019Updated 6 years ago
- [ICML 2025🔥] ParallelComp: Parallel Long-Context Compressor for Length Extrapolation☆30Jun 16, 2025Updated 10 months ago
- (AAAI 2023) Better Generalized Few-Shot Learning Even Without Base Data☆13Nov 29, 2022Updated 3 years ago
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections☆21Oct 15, 2024Updated last year
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆23Nov 11, 2025Updated 5 months ago
- Caffe implementation of Optimal-Ternary-Weights-Approximation in "Two-Step Quantization for Low-bit Neural Networks" (CVPR2018).☆15Sep 21, 2018Updated 7 years ago
- This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.☆20Dec 1, 2023Updated 2 years ago
- Code to implement the experiments in "Post-training Quantization for Neural Networks with Provable Guarantees" by Jinjie Zhang, Yixuan Zh…☆11Jun 2, 2023Updated 2 years ago
- Deeplearning4j Android Example repository☆10Feb 8, 2016Updated 10 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆20Oct 6, 2023Updated 2 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- the Pytorch implementation for our EMNLP 2021 paper "Learning Neural Templates for Recommender Dialogue System"☆30Apr 11, 2022Updated 4 years ago
- ☆13Apr 10, 2017Updated 9 years ago
- A 8-/16-/32-/64-bit floating point number family☆16Feb 4, 2022Updated 4 years ago
- ☆26Mar 21, 2024Updated 2 years ago
- Code for the paper "AsFT: Anchoring Safety During LLM Fune-Tuning Within Narrow Safety Basin".☆36Jul 10, 2025Updated 9 months ago