alibaba-edu / qwen-bailian-usagetraces-anonView external linksLinks
☆81Jan 22, 2026Updated 3 weeks ago
Alternatives and similar repositories for qwen-bailian-usagetraces-anon
Users that are interested in qwen-bailian-usagetraces-anon are comparing it to the libraries listed below
Sorting:
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆41May 13, 2025Updated 9 months ago
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆112Dec 31, 2025Updated last month
- Efficient GPU communication over multiple NICs.☆22Nov 20, 2025Updated 2 months ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆92Jan 26, 2026Updated 3 weeks ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)☆93Jul 14, 2023Updated 2 years ago
- a simple API to use CUPTI☆11Aug 19, 2025Updated 5 months ago
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆88Dec 2, 2025Updated 2 months ago
- RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]☆13Dec 9, 2024Updated last year
- The (open-source part of) code to reproduce "BPPSA: Scaling Back-propagation by Parallel Scan Algorithm".☆13Jun 7, 2021Updated 4 years ago
- A distributed in-memory store for temporal knowledge graphs☆10Mar 20, 2024Updated last year
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆209Sep 21, 2024Updated last year
- Distributed MoE in a Single Kernel [NeurIPS '25]☆193Feb 7, 2026Updated last week
- Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.☆19Jan 15, 2025Updated last year
- ☆27Jun 22, 2025Updated 7 months ago
- 不到100行代码实现一个Python迷你内网穿透、反向正向代理小工具☆12May 27, 2023Updated 2 years ago
- A record of reading list on some MLsys popular topic☆21Mar 20, 2025Updated 10 months ago
- LoRAFusion: Efficient LoRA Fine-Tuning for LLMs☆23Sep 23, 2025Updated 4 months ago
- ☆19Jun 1, 2025Updated 8 months ago
- Spring 2022 Course Website for Operating System Course at Peking University☆11Oct 14, 2022Updated 3 years ago
- PA + Labs for Operating Systems 2019 course in NJU taught by JYY.☆12Aug 6, 2019Updated 6 years ago
- An Attention Superoptimizer☆22Jan 20, 2025Updated last year
- 训练营训练方向项目☆26Jan 28, 2026Updated 3 weeks ago
- [AFK] Hardware router in Chisel (THU Network Joint Lab 2020)☆14Oct 8, 2020Updated 5 years ago
- ☆12May 13, 2025Updated 9 months ago
- ☆13Mar 26, 2024Updated last year
- [ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining☆12Dec 4, 2023Updated 2 years ago
- Stateful LLM Serving☆95Mar 11, 2025Updated 11 months ago
- A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs☆158Jan 13, 2026Updated last month
- NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading☆84Jun 16, 2025Updated 8 months ago
- PilotFish harvests the free GPU cycles of cloud gaming with deep learning training☆14Jul 2, 2022Updated 3 years ago
- DRAM/SSD hybrid caching system☆14Mar 13, 2025Updated 11 months ago
- ☆17May 10, 2024Updated last year
- FAST implemented on Xilinx Zynq7000 SoC board☆15Jul 14, 2020Updated 5 years ago
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆35Jan 9, 2023Updated 3 years ago
- ☆17Jan 12, 2024Updated 2 years ago
- AI model training on heterogeneous, geo-distributed resources☆35Nov 24, 2025Updated 2 months ago
- ☆37Oct 11, 2025Updated 4 months ago
- ☆18Dec 11, 2023Updated 2 years ago