vllm-project / vllm-ncclView external linksLinks
Manages vllm-nccl dependency
☆17Jun 3, 2024Updated last year
Alternatives and similar repositories for vllm-nccl
Users that are interested in vllm-nccl are comparing it to the libraries listed below
Sorting:
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 2 months ago
- ☆11Apr 3, 2023Updated 2 years ago
- FPGA-based HyperLogLog Accelerator☆12Jul 13, 2020Updated 5 years ago
- PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.☆10Feb 10, 2022Updated 4 years ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆32Nov 29, 2024Updated last year
- A Triton-only attention backend for vLLM☆23Updated this week
- creditmodel, 模型,评分卡,scorecard, vintage, automatic modeling☆11Aug 10, 2024Updated last year
- The Bytepiece Tokenizer Implemented in Rust.☆14Nov 28, 2023Updated 2 years ago
- ☆20May 14, 2025Updated 9 months ago
- An external memory allocator example for PyTorch.☆16Aug 10, 2025Updated 6 months ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Nov 23, 2024Updated last year
- This is an official GitHub repository for the paper, "Towards timeout-less transport in commodity datacenter networks.".☆16Oct 12, 2021Updated 4 years ago
- SmartNIC☆14Dec 13, 2018Updated 7 years ago
- Johnson-Lindenstrauss transform (JLT), random projections (RP), fast Johnson-Lindenstrauss transform (FJLT), and randomized Hadamard tran…☆22Jul 11, 2023Updated 2 years ago
- Elixir: Train a Large Language Model on a Small GPU Cluster☆15Jun 8, 2023Updated 2 years ago
- ☆47Dec 13, 2024Updated last year
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆17Mar 13, 2023Updated 2 years ago
- ☆20Sep 28, 2024Updated last year
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- Network Traffic Transformer to learn network dynamics from packet traces. Learn fundamental dynamics with pre-training and fine-tune to m…☆23Jan 17, 2024Updated 2 years ago
- ☆21Mar 22, 2021Updated 4 years ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆56Jul 23, 2024Updated last year
- An FPGA integration and acceleration of the popular FAISS framework for approximate similarity search☆25Jul 20, 2019Updated 6 years ago
- Manually implemented quantization-aware training☆23Oct 12, 2022Updated 3 years ago
- ☆24May 6, 2022Updated 3 years ago
- ☆27Mar 2, 2023Updated 2 years ago
- Artifact evaluation repo for EuroSys'24.☆29Nov 7, 2023Updated 2 years ago
- Nex Venus Communication Library☆72Nov 17, 2025Updated 2 months ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆44Feb 27, 2025Updated 11 months ago
- The prototype for NSDI paper "NetHint: White-Box Networking for Multi-Tenant Data Centers"☆26Feb 2, 2024Updated 2 years ago
- A Learnable LSH Framework for Efficient NN Training☆34Jul 22, 2021Updated 4 years ago
- QJL: 1-Bit Quantized JL transform for KV Cache Quantization with Zero Overhead☆31Jan 27, 2025Updated last year
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆13Jan 16, 2026Updated last month
- [SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference☆82Dec 7, 2025Updated 2 months ago
- ☆36Jan 21, 2021Updated 5 years ago
- 国产加速卡-海光DCU实战(大模型训练、微调、推理 等)☆69Aug 10, 2025Updated 6 months ago
- Kinematic and dynamic models of continuum and articulated soft robots.☆15Nov 22, 2025Updated 2 months ago
- Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion☆32May 15, 2024Updated last year
- ☆44Jul 8, 2024Updated last year