LLM 推理服务性能测试
☆44Dec 17, 2023Updated 2 years ago
Alternatives and similar repositories for llm-inference-benchmark
Users that are interested in llm-inference-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- survery of small language models☆18Jul 23, 2024Updated last year
- ☆14Sep 2, 2024Updated last year
- LLM Inference benchmark☆436Jul 23, 2024Updated last year
- This is the official implementation for our paper;"LAR:Look Around and Refer".☆30Dec 1, 2022Updated 3 years ago
- gradio bbox labeling tools☆11May 12, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.☆13Jun 7, 2023Updated 2 years ago
- CUDA keyring packaging for Debian☆14Apr 14, 2023Updated 3 years ago
- Repository for the COLM 2025 paper SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths☆19Jul 10, 2025Updated 10 months ago
- Build gstreamer on Raspberry Pi 3☆14Nov 2, 2018Updated 7 years ago
- 景区综合管理平台 ----echats 和 大屏 的完美结合 ,大屏宽度(百分比)高度(rem)自适应☆11Apr 27, 2018Updated 8 years ago
- text security audit 安全审核-语义模型过滤 敏感内容检测系统☆39Feb 14, 2025Updated last year
- MobileSAM のエンコーダー/デコーダーをONNXに変換し、推論するサンプル☆12Apr 11, 2024Updated 2 years ago
- Asterinas Confidential Computing is a collection of open-source projects featuring full-stack capabilities in confidential computing.☆16Oct 15, 2024Updated last year
- High-Performance Linpack Benchmark adopted version for GPU backend☆12Sep 12, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A Framework for Machine Learning on Encrypted Data☆12Feb 10, 2022Updated 4 years ago
- Notes on putting micropython on STM32F407VG bare board☆11Oct 7, 2019Updated 6 years ago
- N-body simulation based on CUDA.☆14Jun 20, 2019Updated 6 years ago
- Generate text images for training deep learning ocr model☆10Oct 22, 2018Updated 7 years ago
- 大语言模型评估平台,支持多种评估基准、自定义数据集和性能测试。支持基于自定义数据集的RAG评估。☆89Aug 20, 2025Updated 9 months ago
- ☆17Nov 27, 2023Updated 2 years ago
- Implementation of various algorithms in the Nested Sequential Monte Carlo family of methods.☆14Sep 9, 2015Updated 10 years ago
- ☆10Jul 18, 2024Updated last year
- ☆15Apr 13, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆23Dec 20, 2019Updated 6 years ago
- 用大模型批量处理数据,现支持各种大模型做OCR,支持通义千问, 月之暗面, 百度飞桨OCR, OpenAI 和LLAVA。Use LLM to generate or clean data for academic use. Support OCR with qwen, m…☆17Sep 15, 2024Updated last year
- 该部分为自己在学习tensorflow2.0中实现的各种模型还有算法,供大家参考☆20Jul 30, 2020Updated 5 years ago
- a game framework. warning: wip, dev, unstable, radiation hazard, defcon 3☆24May 10, 2015Updated 11 years ago
- ☆16Nov 19, 2025Updated 6 months ago
- inference on tvm runtime using c++ with gpu enabled☆10Apr 25, 2018Updated 8 years ago
- ☆12Jan 25, 2023Updated 3 years ago
- ☆16Jan 23, 2025Updated last year
- 系统规划与管理师学习笔记☆12Apr 7, 2021Updated 5 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A portable simplest oblivious transfer library.☆15Mar 30, 2025Updated last year
- a simple pingpong buffer test☆12Feb 11, 2015Updated 11 years ago
- 这是一个基于OpenCompass的模型评测系统,该系统提供了前端页面UI以方便用户自助开展评测工作。☆27Aug 25, 2025Updated 9 months ago
- 小飞机翻墙教程☆24Nov 14, 2019Updated 6 years ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆61Oct 1, 2024Updated last year
- FinRAG: Financial Retrieval Augmented Generation☆43Aug 28, 2024Updated last year
- This is a depth-anything-v2 onnxruntime inference by cpp☆15Sep 2, 2024Updated last year