A family of efficient edge language models in 100M~1B sizes.
☆18Feb 14, 2025Updated last year
Alternatives and similar repositories for EfficientLLM
Users that are interested in EfficientLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models"☆31Mar 26, 2026Updated 3 weeks ago
- Official PyTorch code for UAI 2023 paper "Concurrent Misclassification and Out-of-Distribution Detection for Semantic Segmentation via En…☆12Nov 10, 2023Updated 2 years ago
- Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries☆37Nov 19, 2025Updated 4 months ago
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment☆16Aug 6, 2024Updated last year
- [ICLR 2026] Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding☆31Jan 27, 2026Updated 2 months ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- my poc☆16Oct 28, 2020Updated 5 years ago
- ☆13Sep 25, 2023Updated 2 years ago
- ☆13Oct 13, 2025Updated 6 months ago
- Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".☆19Oct 30, 2024Updated last year
- ☆13Apr 27, 2024Updated last year
- Python implementation of the Huffman Code compression algorithm.☆14Apr 18, 2013Updated 12 years ago
- Official PyTorch implementation of "Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming" (ICML'23)☆13Jul 11, 2024Updated last year
- NITEC: Versatile Hand-Annotated Eye Contact Dataset for Ego-Vision Interaction (WACV24)☆19Jul 17, 2024Updated last year
- [ICLR 2025] Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron☆30Apr 30, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [NeurIPS 2025 Spotlight] A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone.☆45Oct 29, 2025Updated 5 months ago
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models☆74Jan 6, 2024Updated 2 years ago
- Efficient 2:4 sparse training algorithms and implementations☆59Dec 8, 2024Updated last year
- ☆16Feb 2, 2025Updated last year
- Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer☆16Sep 7, 2024Updated last year
- [ICML 2025🔥] ParallelComp: Parallel Long-Context Compressor for Length Extrapolation☆30Jun 16, 2025Updated 10 months ago
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections☆21Oct 15, 2024Updated last year
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs☆23Nov 11, 2025Updated 5 months ago
- Caffe implementation of Optimal-Ternary-Weights-Approximation in "Two-Step Quantization for Low-bit Neural Networks" (CVPR2018).☆15Sep 21, 2018Updated 7 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive La…☆85Jun 30, 2024Updated last year
- This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.☆20Dec 1, 2023Updated 2 years ago
- Code to implement the experiments in "Post-training Quantization for Neural Networks with Provable Guarantees" by Jinjie Zhang, Yixuan Zh…☆11Jun 2, 2023Updated 2 years ago
- Deeplearning4j Android Example repository☆10Feb 8, 2016Updated 10 years ago
- Quantization of LLMs and benchmarking.☆10Apr 3, 2024Updated 2 years ago
- ☆22Jun 10, 2025Updated 10 months ago
- Multiple Generalized Additive Models implemented in Python (EBM, XGB, Spline, FLAM). Code for our KDD 2021 paper "How Interpretable and T…☆13Aug 15, 2021Updated 4 years ago
- Fast and memory-efficient exact attention☆20Updated this week
- [NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"☆28May 28, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML 2024)☆31Aug 15, 2024Updated last year
- 2SSP: A Two-Stage Framework for Structured Pruning of LLMs☆20Aug 18, 2025Updated 7 months ago
- Neural network implementation on STM32☆21Nov 30, 2021Updated 4 years ago
- The loss landscape of Large Language Models resemble basin!☆37Jul 8, 2025Updated 9 months ago
- ☆12Nov 22, 2022Updated 3 years ago
- Distributed DRL by Ray and TensorFlow Tutorial.☆10Dec 26, 2019Updated 6 years ago
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆60Jan 5, 2026Updated 3 months ago