PantheonInfer / PantheonLinks
Source code for the paper: "Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs"
☆15Updated last year
Alternatives and similar repositories for Pantheon
Users that are interested in Pantheon are comparing it to the libraries listed below
Sorting:
- Source code for Jellyfish, a soft real-time inference serving system☆15Updated 3 years ago
- ☆212Updated 2 years ago
- 🔥 A curated roadmap to the Efficient VLA landscape. We’re keeping this list live—contribute your latest work!☆74Updated last week
- A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention☆278Updated 2 months ago
- [MobiCom 24] Efficient and Adaptive DNN inference under changeable memory budgets☆58Updated last year
- ☆10Updated last year
- 🔥This is a curated list of "A survey on Efficient Vision-Language Action Models" research. We will continue to maintain and update the r…☆126Updated last month
- This is a list of awesome edgeAI inference related papers.☆99Updated 2 years ago
- Official Code for LightVLA (ICRA 2026)☆77Updated last week
- [ECCV 2024] AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer☆41Updated last year
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.☆95Updated last year
- Survey Paper List - Efficient LLM and Foundation Models☆260Updated last year
- Offcial code for the ECCV2024 paper "Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities"☆25Updated last year
- One-size-fits-all model for mobile AI, a novel paradigm for mobile AI in which the OS and hardware co-manage a foundation model that is c…☆29Updated last year
- ☆120Updated last week
- Pytorch implementation of our paper MaxQ: Multi-Axis Query for N:M Sparsity Network accepted by CVPR 2024.☆37Updated last year
- Official Implementation of DART (DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference).☆40Updated this week
- ☆35Updated last year
- [EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs☆79Updated last year
- Code for (MobiCom24) Delta: A Cloud-assisted Data Enrichment Framework for On-Device Continual Learning☆13Updated last year
- Repo for USENIX security 2024 paper "On Data Fabrication in Collaborative Vehicular Perception: Attacks and Countermeasures" https://arxi…☆21Updated 3 months ago
- Official code implementation for 2025 ICLR accepted paper "Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"☆50Updated 3 months ago
- [ICLR'25] Official Implementation of STAMP: Scalable Task And Model-agnostic Collaborative Perception☆55Updated last year
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆87Updated 2 months ago
- ☆16Updated 2 years ago
- PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005☆45Updated last year
- ☆18Updated last year
- Accommodating Large Language Model Training over Heterogeneous Environment.☆25Updated 10 months ago
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" and "Sp…☆237Updated last month
- ☆102Updated 2 years ago