Run generative AI models in sophgo BM1684X/BM1688
☆270Mar 4, 2026Updated this week
Alternatives and similar repositories for LLM-TPU
Users that are interested in LLM-TPU are comparing it to the libraries listed below
Sorting:
- ☆474Updated this week
- Machine learning compiler based on MLIR for Sophgo TPU.☆873Feb 12, 2026Updated 3 weeks ago
- 适用于sophon bm1684x,基于 Langchain 与 ChatGLM 等语言模型的本地知识库问答☆14Jun 5, 2024Updated last year
- run ChatGLM2-6B in BM1684X☆49Mar 1, 2024Updated 2 years ago
- ChatTTS is a generative speech model for daily dialogue.☆14Oct 21, 2024Updated last year
- Text2speech & tone color conversion demo running on SG2300x 结合openvoice和emotivoice的TTS+即时克隆☆22Oct 30, 2024Updated last year
- A whisper repo for TPU☆11Jun 4, 2024Updated last year
- Sophgo AI chips driver and runtime library.☆24Feb 5, 2026Updated last month
- ☆134Dec 10, 2025Updated 2 months ago
- run chatglm3-6b in BM1684X☆39Mar 1, 2024Updated 2 years ago
- 适用于sophon bm1684x的Langchain-Chatchat,基于 Langchain 与 ChatGLM 等语言模型的本地知识库问答☆18May 23, 2024Updated last year
- llm deploy project based onnx.☆50Oct 9, 2024Updated last year
- Stable Diffusion+LCM在SG2300X上,纵享丝滑一秒出图☆17Nov 29, 2024Updated last year
- llm-export can export llm model to onnx.☆344Oct 24, 2025Updated 4 months ago
- Explore LLM model deployment based on AXera's AI chips☆141Feb 25, 2026Updated last week
- 使用SG2300X实现无瑕疵换脸☆33Sep 2, 2024Updated last year
- ☆20Dec 3, 2025Updated 3 months ago
- ☆1,253Nov 24, 2025Updated 3 months ago
- RISC-V vector and tensor compute extensions for Vortex GPGPU acceleration for ML workloads. Optimized for transformer models, CNNs, and g…☆21Apr 25, 2025Updated 10 months ago
- Deepseek-r1复现科普与资源汇总☆22Mar 5, 2025Updated last year
- Open source RTL implementation of Tensor Core, Sparse Tensor Core, BitWave and SparSynergy in the article: "SparSynergy: Unlocking Flexib…☆22Mar 29, 2025Updated 11 months ago
- Development repository for the Triton-Linalg conversion☆215Feb 7, 2025Updated last year
- JAX bindings for the flash-attention3 kernels☆21Jan 2, 2026Updated 2 months ago
- Model Quantization Benchmark☆18Sep 30, 2025Updated 5 months ago
- sophon-tools(xun.li,mingxuan.che)☆21Updated this week
- a clone of POCL that includes RISC-V newlib devices support and Vortex☆49Jan 14, 2026Updated last month
- ☆39Feb 12, 2026Updated 3 weeks ago
- EdgeInfer enables efficient edge intelligence by running small AI models, including embeddings and OnnxModels, on resource-constrained de…☆50Apr 17, 2024Updated last year
- qwen2 and llama3 cpp implementation☆49Jun 7, 2024Updated last year
- 基于MNN-llm的安卓手机部署大语言模型:Qwen1.5-0.5B-Chat☆90Apr 8, 2024Updated last year
- Whisper in TensorRT-LLM☆17Sep 21, 2023Updated 2 years ago
- Artifact evaluation of PLDI'24 paper "Allo: A Programming Model for Composable Accelerator Design"☆33Apr 11, 2024Updated last year
- ☆30Jun 2, 2022Updated 3 years ago
- [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.☆680Nov 19, 2025Updated 3 months ago
- ☆2,679Jul 29, 2025Updated 7 months ago
- SAM and lama inpaint,包含QT的GUI交互界面,实现了交互式可实时显示结果的画点、画框进行SAM,然后通过进行Inpaint,具体操作看readme里的视频。☆52Jan 30, 2024Updated 2 years ago
- Manages vllm-nccl dependency☆17Jun 3, 2024Updated last year
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆817Mar 6, 2025Updated last year
- LLMA = LLM + Arithmetic coder, which use LLM to do insane text data compression. LLMA=大模型+算术编码,它能使用LLM对文本数据进行暴力的压缩,达到极高的压缩率。☆22Nov 24, 2024Updated last year