Eddie-Wang1120 / Eddie-Wang-Hackathon2023
Whisper inference with TensorRT-LLM
☆21Updated last year
Alternatives and similar repositories for Eddie-Wang-Hackathon2023:
Users that are interested in Eddie-Wang-Hackathon2023 are comparing it to the libraries listed below
- ☆71Updated 2 years ago
- ☆74Updated 2 years ago
- export llama to onnx☆115Updated 2 months ago
- simplify >2GB large onnx model☆54Updated 3 months ago
- ☆139Updated 10 months ago
- Serving Inside Pytorch☆156Updated this week
- symmetric int8 gemm☆66Updated 4 years ago
- A Toolkit to Help Optimize Large Onnx Model☆153Updated 10 months ago
- Transformer related optimization, including BERT, GPT☆17Updated last year
- llm deploy project based onnx.☆31Updated 5 months ago
- Transformer related optimization, including BERT, GPT☆59Updated last year
- ☆24Updated last year
- A Toolkit to Help Optimize Onnx Model☆124Updated this week
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆35Updated 3 weeks ago
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆38Updated 7 months ago
- ☆33Updated last year
- llm-export can export llm model to onnx.☆271Updated 2 months ago
- ☆127Updated 2 months ago
- ASR client for Triton ASR Service☆27Updated 3 months ago
- ☆58Updated 4 months ago
- Compare multiple optimization methods on triton to imporve model service performance☆50Updated last year
- ☆124Updated last year
- qwen2 and llama3 cpp implementation☆43Updated 9 months ago
- A quantization algorithm for LLM☆136Updated 9 months ago
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆46Updated last year
- ☆35Updated 5 months ago
- ☆98Updated 3 years ago
- Use PyTorch model in C++ project☆137Updated 3 years ago
- LLaMa/RWKV onnx models, quantization and testcase☆359Updated last year
- ☆26Updated last year