Eddie-Wang1120 / Eddie-Wang-Hackathon2023Links

Whisper inference with TensorRT-LLM

☆22

Alternatives and similar repositories for Eddie-Wang-Hackathon2023

Users that are interested in Eddie-Wang-Hackathon2023 are comparing it to the libraries listed below

Sorting:

dingyuqing05 / trt2022_wenet
☆72Updated 2 years ago
luchangli03 / export_llama_to_onnx
export llama to onnx
☆136Updated 9 months ago
void-main / FasterTransformer
Transformer related optimization, including BERT, GPT
☆59Updated 2 years ago
mlc-ai / tokenizers-cpp
Universal cross-platform tokenizers binding to HF and sentencepiece
☆400Updated 2 months ago
OpenPPL / ppl.nn.llm
☆139Updated last year
ByteDance-Seed / decoupleQ
A quantization algorithm for LLM
☆143Updated last year
wangzhaode / llm-export
llm-export can export llm model to onnx.
☆314Updated last month
OpenPPL / ppl.pmx
☆59Updated 11 months ago
weishengying / tiny-flash-attention
使用 cutlass 实现 flash-attention 精简版，具有教学意义
☆50Updated last year
tpoisonooo / llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
☆367Updated 2 years ago
OpenPPL / ppl.llm.serving
☆129Updated 10 months ago
neuralmagic / AutoFP8
☆205Updated 5 months ago
Bruce-Lee-LY / flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
☆41Updated 8 months ago
luchangli03 / onnxsim_large_model
simplify >2GB large onnx model
☆63Updated 10 months ago
yuekaizhang / Triton-ASR-Client
ASR client for Triton ASR Service
☆33Updated 10 months ago
tpoisonooo / chgemm
symmetric int8 gemm
☆67Updated 5 years ago
bytedance / ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
☆479Updated last year
modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆266Updated 2 months ago
OpenPPL / ppl.llm.kernel.cuda
☆150Updated 9 months ago
ThanatosShinji / onnx-tool
A parser, editor and profiler tool for ONNX models.
☆460Updated 2 months ago
NVIDIA / online-softmax
Benchmark code for the "Online normalizer calculation for softmax" paper
☆102Updated 7 years ago
triton-inference-server / backend
Common source, scripts and utilities for creating Triton backends.
☆352Updated 2 weeks ago
Rayrtfr / FasterTransformer
Transformer related optimization, including BERT, GPT
☆17Updated 2 years ago
tsingmicro-toolchain / OnnxSlim
A Toolkit to Help Optimize Large Onnx Model
☆161Updated last year
huismiling / wenet_trt8
☆76Updated 3 years ago
weishengying / cute_gemm
☆18Updated last year
torchpipe / torchpipe
Serving Inside Pytorch
☆163Updated last month
YellowOldOdd / SDBI
Simple Dynamic Batching Inference
☆145Updated 3 years ago
ModelTC / LightCompress
A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
☆593Updated 2 months ago
AniZpZ / AutoSmoothQuant
An easy-to-use package for implementing SmoothQuant for LLMs
☆107Updated 6 months ago