☆29Feb 3, 2026Updated 2 months ago
Alternatives and similar repositories for SpecOffload-public
Users that are interested in SpecOffload-public are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The code based on vLLM for the paper “ Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention”.☆11Sep 19, 2024Updated last year
- ☆20Jun 9, 2025Updated 10 months ago
- ☆39Nov 28, 2024Updated last year
- Experimental repository for GSoC 2024.☆15Aug 29, 2024Updated last year
- Ever wondered how popular your GitHub repo is compared to others?☆17Feb 14, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆18Apr 11, 2025Updated last year
- UniVid: The Open-Source Unified Video Model☆31Oct 13, 2025Updated 6 months ago
- ☆21Oct 2, 2024Updated last year
- [USENIX Security '25] My ZIP isn’t your ZIP: Identifying and Exploiting Semantic Gaps Between ZIP Parsers☆38Mar 20, 2026Updated 3 weeks ago
- Official implementation for ICLR 2023 paper Consolidator: Mergeable Adapter with Grouped Connections for Visual Adaptation☆16Jan 23, 2024Updated 2 years ago
- ☆11Mar 9, 2026Updated last month
- Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The infere…☆22May 9, 2025Updated 11 months ago
- ☆15Jun 26, 2024Updated last year
- ☆18Oct 29, 2025Updated 5 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆13Mar 6, 2023Updated 3 years ago
- ☆19Feb 18, 2025Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆263Nov 18, 2024Updated last year
- The official implementation for the intra-stage fusion technique introduced in https://arxiv.org/abs/2409.13221☆31Apr 22, 2025Updated 11 months ago
- [NeurIPS 2022] ASPiRe: Adaptive Skill Priors for Reinforcement Learning☆13Oct 19, 2022Updated 3 years ago
- This repository is the accompanying code for the paper CFVFP. This paper presents a new algorithm for solving incomplete information game…☆14Feb 23, 2025Updated last year
- 该资源为作者AI安全相关论文的分享知识,包括PPT和PDF版本及原文,希望对您有所帮助。加油~☆32Jan 9, 2025Updated last year
- ☆15Nov 9, 2024Updated last year
- Mamba-Spike——CGI2024☆14Dec 3, 2025Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆23Mar 15, 2024Updated 2 years ago
- The source code of [Sec'25] Make Agent Defeat Agent: Automatic Detection of Taint-Style Vulnerabilities in LLM-based Agents☆68Sep 9, 2025Updated 7 months ago
- Code for Federated Neuromorphic Learning of Spiking Neural Networks for Low-Power Edge Intelligence☆17Dec 9, 2020Updated 5 years ago
- Library to interface Compilers and ML models for ML-Enabled Compiler Optimizations☆20Oct 19, 2025Updated 6 months ago
- AccelOpt: Self-improving Agents for AI Accelerator Kernel Optimization☆33Updated this week
- Whisper inference with TensorRT-LLM☆25Sep 22, 2023Updated 2 years ago
- ☆30Jul 22, 2024Updated last year
- ☆22Oct 7, 2025Updated 6 months ago
- LLM Inference with Microscaling Format☆34Nov 12, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Deep Neural Network Compression based on Student-Teacher Network☆14Jul 6, 2023Updated 2 years ago
- PyTorch code for full quantization of DNN using BCGD☆14Jul 24, 2019Updated 6 years ago
- Topic models for microblogging content☆10Sep 23, 2015Updated 10 years ago
- iGniter, an interference-aware GPU resource provisioning framework for achieving predictable performance of DNN inference in the cloud.☆39Jun 11, 2024Updated last year
- finetune chinese bert with sentence-transformers☆11May 8, 2021Updated 4 years ago
- Large-scale exact string matching tool☆17Mar 7, 2025Updated last year
- 基于pytorch_rnn的古诗词生成☆11Oct 24, 2021Updated 4 years ago