☆32Jan 16, 2025Updated last year
Alternatives and similar repositories for QLM
Users that are interested in QLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling☆13Mar 7, 2024Updated 2 years ago
- Official code for the paper "HEXA-MoE: Efficient and Heterogeneous-Aware MoE Acceleration with Zero Computation Redundancy"☆15Mar 6, 2025Updated last year
- "Learning Stable Classifiers by Transferring Unstable Features" ICML 2022☆14Jul 24, 2022Updated 3 years ago
- ☆23Oct 10, 2025Updated 7 months ago
- A reading group for system verification papers☆10Sep 28, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A language for video analytics☆12Jan 26, 2023Updated 3 years ago
- Very fast C++ importer from csv files to sqlite3 databases☆15Mar 29, 2016Updated 10 years ago
- [ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…☆13Apr 17, 2025Updated last year
- Sardeenz is a proof-of-concept application that allows you to load more than one model on a given GPU. It allows you to add more and more…☆58Updated this week
- High-performance GEMM implementation optimized for NVIDIA H100 GPUs, leveraging Hopper architecture's TMA, WGMMA, and Thread Block Cluste…☆10Dec 4, 2024Updated last year
- Tutorial Exercises and Code for GPU Communications Tutorial at HOT Interconnects 2025☆32Oct 22, 2025Updated 7 months ago
- ☆14Mar 15, 2026Updated 2 months ago
- ☆91Oct 17, 2025Updated 7 months ago
- ☆29Jan 17, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆20Updated this week
- Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".☆19Oct 30, 2024Updated last year
- [NeurIPS 2025] Official Implementation of ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding.☆56Jan 28, 2026Updated 3 months ago
- Differentiable non-uniform interpolation: https://arxiv.org/abs/2012.13257☆11Oct 3, 2021Updated 4 years ago
- SKYFALL: dynamically identifies and exploits bottleneck links with a geo-distributed botnet to flood them.☆12Oct 23, 2024Updated last year
- ☆61May 4, 2024Updated 2 years ago
- RECORD: A RECeption-Only Region Determination Attack on LEO Satellite Users - Simulation Code☆12Mar 20, 2024Updated 2 years ago
- ☆17Feb 12, 2025Updated last year
- Simulation tool for CDN replication in large low-earth orbit satellite access networks.☆13May 17, 2021Updated 5 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆18Dec 22, 2023Updated 2 years ago
- ☆19Jan 27, 2025Updated last year
- ☆37Jul 21, 2025Updated 10 months ago
- A Python program that simulates a satellite network using pygame, allowing users to create, configure, and visualize the network state ov…☆11Apr 25, 2023Updated 3 years ago
- LEO Satellite vs. Cellular Networks: Exploring the Potential for Synergistic Integration (CoNEXT '23)☆11Oct 26, 2023Updated 2 years ago
- Kubernetes operator for local LLM inference with llama.cpp, vLLM, and TGI - multi-GPU, autoscaling, air-gapped, production-ready☆88May 17, 2026Updated last week
- UI for extracting data from pdf files using watsonx prompts☆12Sep 18, 2025Updated 8 months ago
- ☆13Feb 16, 2023Updated 3 years ago
- Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training☆24Mar 1, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Artifacts Release: A Case for Stateless Mobile Core Network Functions in Space☆16Aug 16, 2022Updated 3 years ago
- Explore Inter-layer Expert Affinity in MoE Model Inference☆16May 6, 2024Updated 2 years ago
- Some tools for DN42 MeowNetwork☆11Apr 3, 2025Updated last year
- ☆81Sep 15, 2025Updated 8 months ago
- ☆15Apr 23, 2026Updated last month
- ☆12Jan 26, 2019Updated 7 years ago
- 2023/12/22 电三 420 每周会议技术分享:「容器」的 slides 和附件☆10Dec 22, 2023Updated 2 years ago