DeepLink-org / DeepLinkExtLinks
☆13Updated 6 months ago
Alternatives and similar repositories for DeepLinkExt
Users that are interested in DeepLinkExt are comparing it to the libraries listed below
Sorting:
- ☆76Updated last year
- ☆73Updated last year
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆945Updated this week
- A benchmark suited especially for deep learning operators☆42Updated 2 years ago
- FlagScale is a large model toolkit based on open-sourced projects.☆425Updated last week
- SGLang kernel library for NPU☆84Updated last week
- ☆517Updated last month
- GLake: optimizing GPU memory management and IO transmission.☆491Updated 8 months ago
- LLM training technologies developed by kwai☆66Updated 3 weeks ago
- A highly optimized LLM inference acceleration engine for Llama and its variants.☆904Updated 5 months ago
- [DAC2024] A Holistic Functionalization Approach to Optimizing Imperative Tensor Programs in Deep Learning☆15Updated last year
- Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM☆68Updated 4 months ago
- Disaggregated serving system for Large Language Models (LLMs).☆749Updated 8 months ago
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆557Updated this week
- This repository organizes materials, recordings, and schedules related to AI-infra learning meetings.☆269Updated 2 weeks ago
- InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencie…☆416Updated 4 months ago
- Ascend TileLang adapter☆165Updated this week
- GEMM by WMMA (tensor core)☆14Updated 3 years ago
- FlagGems is an operator library for large language models implemented in the Triton Language.☆803Updated this week
- learning how CUDA works☆351Updated 9 months ago
- Examples of CUDA implementations by Cutlass CuTe☆260Updated 5 months ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆599Updated last year
- Summary of some awesome work for optimizing LLM inference☆150Updated 3 weeks ago
- ☆152Updated 11 months ago
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆119Updated 7 months ago
- 注释的nano_vllm仓库,并且完成了MiniCPM4的适配以及注册新模型的功能☆118Updated 4 months ago