☆88Jan 22, 2026Updated last month
Alternatives and similar repositories for qwen-bailian-usagetraces-anon
Users that are interested in qwen-bailian-usagetraces-anon are comparing it to the libraries listed below
Sorting:
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆42May 13, 2025Updated 9 months ago
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆112Updated this week
- Efficient GPU communication over multiple NICs.☆26Nov 20, 2025Updated 3 months ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆92Jan 26, 2026Updated last month
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)☆94Jul 14, 2023Updated 2 years ago
- a simple API to use CUPTI☆11Aug 19, 2025Updated 6 months ago
- RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]☆14Dec 9, 2024Updated last year
- The (open-source part of) code to reproduce "BPPSA: Scaling Back-propagation by Parallel Scan Algorithm".☆13Jun 7, 2021Updated 4 years ago
- A distributed in-memory store for temporal knowledge graphs☆10Mar 20, 2024Updated last year
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆210Sep 21, 2024Updated last year
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆93Dec 2, 2025Updated 3 months ago
- Distributed MoE in a Single Kernel [NeurIPS '25]☆195Updated this week
- A record of reading list on some MLsys popular topic☆22Mar 20, 2025Updated 11 months ago
- ☆28Jun 22, 2025Updated 8 months ago
- Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.☆18Jan 15, 2025Updated last year
- 不到100行代码实现一个Python迷你内网穿透、反向正向代理小工具☆12May 27, 2023Updated 2 years ago
- ☆20Jun 1, 2025Updated 9 months ago
- ☆13Mar 26, 2024Updated last year
- Spring 2022 Course Website for Operating System Course at Peking University☆11Oct 14, 2022Updated 3 years ago
- [AFK] Hardware router in Chisel (THU Network Joint Lab 2020)☆14Oct 8, 2020Updated 5 years ago
- ☆12May 13, 2025Updated 9 months ago
- LoRAFusion: Efficient LoRA Fine-Tuning for LLMs☆25Sep 23, 2025Updated 5 months ago
- PA + Labs for Operating Systems 2019 course in NJU taught by JYY.☆12Aug 6, 2019Updated 6 years ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- 训练营训练方向项目☆26Jan 28, 2026Updated last month
- [ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining☆12Dec 4, 2023Updated 2 years ago
- Stateful LLM Serving☆97Mar 11, 2025Updated 11 months ago
- A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs☆159Jan 13, 2026Updated last month
- FAST implemented on Xilinx Zynq7000 SoC board☆15Jul 14, 2020Updated 5 years ago
- ☆17May 10, 2024Updated last year
- DRAM/SSD hybrid caching system☆15Mar 13, 2025Updated 11 months ago
- PilotFish harvests the free GPU cycles of cloud gaming with deep learning training☆14Jul 2, 2022Updated 3 years ago
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆36Jan 9, 2023Updated 3 years ago
- NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading☆85Jun 16, 2025Updated 8 months ago
- ☆17Jan 12, 2024Updated 2 years ago
- ☆18Dec 11, 2023Updated 2 years ago
- NUMA-Aware Reader-Writer Locks☆19Jun 12, 2014Updated 11 years ago