DeepLink-org / DeepLinkExtLinks
☆13Updated 6 months ago
Alternatives and similar repositories for DeepLinkExt
Users that are interested in DeepLinkExt are comparing it to the libraries listed below
Sorting:
- ☆72Updated last year
- ☆75Updated last year
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆523Updated this week
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆934Updated this week
- FlagGems is an operator library for large language models implemented in the Triton Language.☆783Updated this week
- FlagScale is a large model toolkit based on open-sourced projects.☆416Updated last week
- Disaggregated serving system for Large Language Models (LLMs).☆737Updated 8 months ago
- ☆514Updated 2 weeks ago
- Ascend TileLang adapter☆153Updated this week
- [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLM, VLM, and video generation models.☆632Updated 2 weeks ago
- Materials for learning SGLang☆667Updated this week
- [DAC2024] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning☆15Updated last year
- SGLang kernel library for NPU☆76Updated this week
- InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencie…☆414Updated 3 months ago
- learning how CUDA works☆347Updated 9 months ago
- 注释的nano_vllm仓库,并且完成了MiniCPM4的适配以及注册新模型的功能☆108Updated 3 months ago
- Examples of CUDA implementations by Cutlass CuTe☆254Updated 5 months ago
- LLM training technologies developed by kwai☆66Updated last week
- Puzzles for learning Triton, play it with minimal environment configuration!☆569Updated this week
- A benchmark suited especially for deep learning operators☆42Updated 2 years ago
- Optimize softmax in triton in many cases☆21Updated last year
- Development repository for the Triton-Linalg conversion☆206Updated 9 months ago
- Ongoing research training transformer models at scale☆19Updated last week
- A highly optimized LLM inference acceleration engine for Llama and its variants.☆905Updated 4 months ago
- flash attention tutorial written in python, triton, cuda, cutlass☆454Updated 6 months ago
- This repository organizes materials, recordings, and schedules related to AI-infra learning meetings.☆255Updated this week