☆80Apr 29, 2026Updated 3 weeks ago
Alternatives and similar repositories for SGLang-FluentLLM
Users that are interested in SGLang-FluentLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆34Dec 10, 2025Updated 5 months ago
- 🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)☆10Updated this week
- 高性能短序列稀疏Mask Attention CUDA算子,针对<1K序列+75%稀疏度优化☆76Mar 18, 2026Updated 2 months ago
- 🚀 First survey on Attention Sink in Transformers — 180+ papers on utilization, interpretation, and mitigation.☆76Apr 16, 2026Updated last month
- ☆10Mar 2, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Pure Java Llama2 inference with optional multi-GPU CUDA implementation☆13Sep 2, 2023Updated 2 years ago
- The repository maintains the source code for the article titled "Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs."☆17Dec 1, 2024Updated last year
- ☆18Apr 8, 2022Updated 4 years ago
- TernGEMM: General Matrix Multiply Library with Ternary Weights for Fast DNN Inference☆14Feb 22, 2022Updated 4 years ago
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- ☆21Apr 13, 2022Updated 4 years ago
- DCPO: Dynamic Adaptive Clipping for RL