Eddie-Wang1120 / Eddie-Wang-Hackathon2023Links
Whisper inference with TensorRT-LLM
☆24Updated 2 years ago
Alternatives and similar repositories for Eddie-Wang-Hackathon2023
Users that are interested in Eddie-Wang-Hackathon2023 are comparing it to the libraries listed below
Sorting:
- ☆71Updated 3 years ago
- export llama to onnx☆137Updated 11 months ago
- Universal cross-platform tokenizers binding to HF and sentencepiece☆435Updated 4 months ago
- A quantization algorithm for LLM☆147Updated last year
- ASR client for Triton ASR Service☆34Updated last week
- Transformer related optimization, including BERT, GPT☆59Updated 2 years ago
- llm-export can export llm model to onnx.☆337Updated 2 months ago
- ☆75Updated 3 years ago
- LLaMa/RWKV onnx models, quantization and testcase☆367Updated 2 years ago
- ☆141Updated last year
- Simple Dynamic Batching Inference☆145Updated 3 years ago
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆52Updated last year
- ONNX Python Examples☆16Updated 3 years ago
- ☆206Updated 7 months ago
- Common source, scripts and utilities for creating Triton backends.☆363Updated this week
- Kaldi-compatible online fbank extractor without external dependencies☆135Updated 2 months ago
- Serving Inside Pytorch☆167Updated 2 weeks ago
- symmetric int8 gemm☆67Updated 5 years ago
- Transformer related optimization, including BERT, GPT☆17Updated 2 years ago
- ☆321Updated last week
- ☆60Updated last year
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆476Updated last year
- Run Chinese MobileBert model on SNPE.☆15Updated 2 years ago
- Common source, scripts and utilities shared across all Triton repositories.☆79Updated 2 weeks ago
- ☆130Updated last year
- simplify >2GB large onnx model☆70Updated last year
- A parser, editor and profiler tool for ONNX models.☆468Updated last month
- A Toolkit to Help Optimize Large Onnx Model☆162Updated 2 months ago
- Use PyTorch model in C++ project☆139Updated 4 years ago
- ☆127Updated last week