daquexian / faster-rwkv
☆123Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for faster-rwkv
- Inference rwkv5 or rwkv6 with Qualcomm AI Engine Direct SDK☆38Updated this week
- stable diffusion using mnn☆64Updated last year
- simplify >2GB large onnx model☆44Updated 8 months ago
- ☆28Updated 4 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆98Updated 2 months ago
- A Toolkit to Help Optimize Large Onnx Model☆149Updated 6 months ago
- ☆82Updated last year
- llm deploy project based onnx.☆26Updated last month
- A converter for llama2.c legacy models to ncnn models.☆82Updated 11 months ago
- Inference RWKV with multiple supported backends.☆26Updated 3 months ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆29Updated 2 months ago
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆40Updated last year
- libvits-ncnn is an ncnn implementation of the VITS library that enables cross-platform GPU-accelerated speech synthesis.🎙️💻☆56Updated last year
- A Toolkit to Help Optimize Onnx Model☆81Updated this week
- ☆124Updated 2 weeks ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆52Updated 3 months ago
- ☆140Updated 7 months ago
- ☆32Updated last month
- ☆36Updated 2 weeks ago
- ☆57Updated this week
- Inference TinyLlama models on ncnn☆25Updated last year
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆32Updated 3 months ago
- ☆79Updated 2 months ago
- export llama to onnx☆98Updated 5 months ago
- Large Language Model Onnx Inference Framework☆26Updated last month
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆90Updated 4 months ago
- qwen2 and llama3 cpp implementation☆34Updated 5 months ago
- MegEngine到其他框架的转换器☆67Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆20Updated 8 months ago