DeepLink-org / DeepLinkExt
☆13Updated 3 months ago
Alternatives and similar repositories for DeepLinkExt
Users that are interested in DeepLinkExt are comparing it to the libraries listed below
Sorting:
- ☆69Updated 5 months ago
- ☆67Updated 6 months ago
- Development repository for the Triton-Linalg conversion☆186Updated 3 months ago
- A benchmark suited especially for deep learning operators☆42Updated 2 years ago
- Puzzles for learning Triton, play it with minimal environment configuration!☆312Updated 5 months ago
- FlagGems is an operator library for large language models implemented in the Triton Language.☆537Updated this week
- Disaggregated serving system for Large Language Models (LLMs).☆584Updated last month
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆749Updated this week
- FlagScale is a large model toolkit based on open-sourced projects.☆276Updated this week
- ☆48Updated last week
- A highly optimized LLM inference acceleration engine for Llama and its variants.☆886Updated this week
- Examples of CUDA implementations by Cutlass CuTe☆177Updated 3 months ago
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆54Updated 9 months ago
- Materials for learning SGLang☆414Updated this week
- ☆148Updated 4 months ago
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆473Updated last week
- ☆16Updated 4 months ago
- A light llama-like llm inference framework based on the triton kernel.☆118Updated this week
- flash attention tutorial written in python, triton, cuda, cutlass☆349Updated this week
- ☆30Updated 2 years ago
- learning how CUDA works☆261Updated 2 months ago
- ☆119Updated 5 months ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆88Updated 4 months ago
- A collection of memory efficient attention operators implemented in the Triton language.☆267Updated 11 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆250Updated 2 months ago
- ☆140Updated 4 months ago
- Distributed Triton for Parallel Systems☆720Updated this week
- ☆139Updated last year
- Yinghan's Code Sample☆327Updated 2 years ago
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆459Updated 8 months ago