LLM 推理服务性能测试
☆44Dec 17, 2023Updated 2 years ago
Alternatives and similar repositories for llm-inference-benchmark
Users that are interested in llm-inference-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- survery of small language models☆18Jul 23, 2024Updated last year
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆11May 6, 2023Updated 2 years ago
- LLM Inference benchmark☆433Jul 23, 2024Updated last year
- This is the official implementation for our paper;"LAR:Look Around and Refer".☆30Dec 1, 2022Updated 3 years ago
- [EMNLP2023]: MIRACLE: Towards Personalized Dialogue Generation with Latent-Space Multiple Personal Attribute Control☆12Nov 11, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Implementation of AdaCQR(COLING 2025)☆15Dec 30, 2024Updated last year
- An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.☆13Jun 7, 2023Updated 2 years ago
- Repository for the COLM 2025 paper SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths☆18Jul 10, 2025Updated 9 months ago
- Python library written in Rust for creating/transporting/parsing AI characters between different frontends (TavernAI, SillyTavern, TextGe…☆21Nov 14, 2025Updated 5 months ago
- CVPR25☆28Jul 2, 2025Updated 9 months ago
- text security audit 安全审核-语义模型过滤 敏感内容检测系统☆38Feb 14, 2025Updated last year
- MobileSAM のエンコーダー/デコーダーをONNXに変換し、推論するサンプル☆12Apr 11, 2024Updated 2 years ago
- Measuring and Controlling Persona Drift in Language Model Dialogs☆23Feb 26, 2024Updated 2 years ago
- A Framework for Machine Learning on Encrypted Data☆12Feb 10, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆12Jun 24, 2024Updated last year
- Notes on putting micropython on STM32F407VG bare board☆11Oct 7, 2019Updated 6 years ago
- Visual self-questioning for large vision-language assistant.☆45Jul 23, 2025Updated 8 months ago
- ☆15Apr 13, 2024Updated 2 years ago
- socat - Multipurpose relay (cloned from git://repo.or.cz/socat.git) http://www.dest-unreach.org/socat/☆20Jan 24, 2016Updated 10 years ago
- In this programming assignment you will implement a streaming video server and client that communicate control commands via the Real-Time…☆11Dec 29, 2012Updated 13 years ago
- 用大模型批量处理数据,现支持各种大模型做OCR,支持通义千问, 月之暗面, 百度飞桨OCR, OpenAI 和LLAVA。Use LLM to generate or clean data for academic use. Support OCR with qwen, m…☆16Sep 15, 2024Updated last year
- inference on tvm runtime using c++ with gpu enabled☆10Apr 25, 2018Updated 7 years ago
- ☆12Jan 25, 2023Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆47Dec 1, 2024Updated last year
- ☆16Jan 23, 2025Updated last year
- Privacy-preserving k-means clustering on data owned by multiple parties☆14May 10, 2016Updated 9 years ago
- ☆21Feb 15, 2024Updated 2 years ago
- A portable simplest oblivious transfer library.☆15Mar 30, 2025Updated last year
- a simple pingpong buffer test☆12Feb 11, 2015Updated 11 years ago
- 小飞机翻墙教程☆24Nov 14, 2019Updated 6 years ago
- This is a depth-anything-v2 onnxruntime inference by cpp☆15Sep 2, 2024Updated last year
- Simple test of ARM NEON code. Performs a blit to the framebuffer.☆15Jul 23, 2013Updated 12 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆11Nov 21, 2022Updated 3 years ago
- FedBERT : A federated approach that enables clients with limited computing resource to participate without violating data privacy.☆14Jul 3, 2023Updated 2 years ago
- 基于langchain和chatglm6b构建的智能问答系统,支持自定义语料☆10Jun 25, 2023Updated 2 years ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆20Mar 28, 2024Updated 2 years ago
- 「城语」APP基于A级景区、历史古迹、文物保护单位等基础数据,利用先进的大模型能力实现智能化的Citywalk 路线规划,包括设计一条路线、生成路线攻略、生成景点的推荐理由等三大核心功能;利用大模型减少了人工编辑和推荐的工作量,并可以根据游客的需求进行个性化定制,提升了游客…☆19Feb 20, 2024Updated 2 years ago
- 简单rtsp服务器,支持264 aac☆12Dec 17, 2021Updated 4 years ago
- Docker&vLLM官方镜像部署DeepSeek模型,在生产环境中提供类OpenAI接口服务。☆14Jul 17, 2025Updated 8 months ago