MILVLG / mlc-impLinks

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

☆10

Alternatives and similar repositories for mlc-imp

Users that are interested in mlc-imp are comparing it to the libraries listed below

Sorting:

megvii-research / IntLLaMA
IntLLaMA: A fast and light quantization solution for LLaMA
☆18Updated 2 years ago
LiqunMa / FBI-LLM
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
☆51Updated last year
facebookresearch / adaptive_scheduling
Experimental scripts for researching data adaptive learning rate scheduling.
☆23Updated last year
lucasjinreal / wnnx_models
Various test models in WNNX format. It can view with `pip install wnetron && wnetron`
☆12Updated 3 years ago
yuzhenmao / IceFormer
Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).
☆25Updated 3 weeks ago
snu-mllab / LayerMerge
Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML 2024)
☆30Updated 11 months ago
NVlabs / STL
Official Pytorch Implementation of Self-emerging Token Labeling
☆35Updated last year
zaydzuhri / pythia-mlkv
Multi-Layer Key-Value sharing experiments on Pythia models
☆33Updated last year
DCDmllm / HyperLLaVA
Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
☆28Updated last year
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆44Updated 8 months ago
NVlabs / EfficientDL
☆33Updated last month
cmd2001 / KVTuner
KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
☆17Updated 2 months ago
lucasjinreal / ImageTokenizer
imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…
☆35Updated last year
facebookresearch / Ternary_Binary_Transformer
ACL 2023
☆39Updated 2 years ago
kiddyboots216 / lottery-ticket-adaptation
Lottery Ticket Adaptation
☆39Updated 8 months ago
The-Inscrutable-X / TACQ
Official Repository for Task-Circuit Quantization
☆21Updated 2 months ago
simonsanvil / DALL-E-Explained
Description and applications of OpenAI's paper about DALL-E (2021) and implementation of other (CLIP-guided) zero-shot text-to-image gene…
☆33Updated 2 years ago
JarvisPei / CMoE
Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
☆22Updated 5 months ago
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated 2 years ago
facebookresearch / NasRec
NASRec Weight Sharing Neural Architecture Search for Recommender Systems
☆30Updated last year
tianyi-lab / R2-T2
[ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"
☆15Updated 4 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆98Updated 10 months ago
huggingface / pixparse
Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data
☆21Updated last year
autodistill / autodistill-efficient-yolo-world
EfficientSAM + YOLO World base model for use with Autodistill.
☆10Updated last year
RWKV / RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…
☆51Updated 4 months ago
yyyyychen / LowMemoryBP
The official implementation of the paper "Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation"
☆20Updated 7 months ago
autodistill / autodistill-grounded-edgesam
EdgeSAM model for use with Autodistill.
☆27Updated last year
rayleizhu / vllm-ra
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆40Updated last year
OpenDFM / MobA
🎮Manipulates mobile phones just like how you would. Official code for "MobA: A Two-Level Agent System for Efficient Mobile Task Automati…
☆25Updated 3 months ago
Qichuzyy / POA
Official implementation of ECCV24 paper: POA
☆24Updated last year