DeepLink-org / DeepLinkExtLinks

☆13

Alternatives and similar repositories for DeepLinkExt

Users that are interested in DeepLinkExt are comparing it to the libraries listed below

Sorting:

DeepLink-org / DIOPI
☆72Updated 8 months ago
DeepLink-org / deeplink.framework
☆68Updated 8 months ago
alibaba / rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
☆815Updated last month
FlagOpen / FlagGems
FlagGems is an operator library for large language models implemented in the Triton Language.
☆633Updated this week
FlagOpen / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆327Updated this week
ifromeast / cuda_learning
learning how CUDA works
☆288Updated 4 months ago
LLMServe / DistServe
Disaggregated serving system for Large Language Models (LLMs).
☆645Updated 3 months ago
Tencent / KsanaLLM
☆466Updated last week
DeepLink-org / DLOP-Bench
A benchmark suited especially for deep learning operators
☆42Updated 2 years ago
antgroup / glake
GLake: optimizing GPU memory management and IO transmission.
☆470Updated 4 months ago
SiriusNEO / Triton-Puzzles-Lite
Puzzles for learning Triton, play it with minimal environment configuration!
☆435Updated 7 months ago
sgl-project / sgl-learning-materials
Materials for learning SGLang
☆494Updated this week
InternLM / InternEvo
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencie…
☆397Updated this week
harleyszhang / llm_counts
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆99Updated 2 weeks ago
PaddleJitLab / CUDATutorial
A self-learning tutorail for CUDA High Performance Programing.
☆680Updated 3 weeks ago
hahnyuan / LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…
☆515Updated 10 months ago
kwai / Megatron-Kwai
[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…
☆59Updated 11 months ago
ModelTC / LightCompress
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…
☆516Updated this week
sgl-project / SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆183Updated this week
Cambricon / triton-linalg
Development repository for the Triton-Linalg conversion
☆190Updated 5 months ago
Cambricon / catch
☆32Updated 2 years ago
OpenPPL / ppl.llm.kernel.cuda
☆149Updated 6 months ago
BBuf / how-to-learn-deep-learning-framework
how to learn PyTorch and OneFlow
☆440Updated last year
zhihu / ZhiLight
A highly optimized LLM inference acceleration engine for Llama and its variants.
☆900Updated 2 weeks ago
gty111 / GEMM_WMMA
GEMM by WMMA (tensor core)
☆13Updated 2 years ago
alibaba / Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
☆1,245Updated 2 weeks ago
DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆209Updated 3 weeks ago
harleyszhang / lite_llama
A light llama-like llm inference framework based on the triton kernel.
☆138Updated this week
FlagOpen / FlagPerf
FlagPerf is an open-source software platform for benchmarking AI chips.
☆343Updated last month
volcengine / veScale
A PyTorch Native LLM Training Framework
☆837Updated 2 weeks ago