LastBotInc / llama2jLinks
Pure Java Llama2 inference with optional multi-GPU CUDA implementation
☆13Updated 2 years ago
Alternatives and similar repositories for llama2j
Users that are interested in llama2j are comparing it to the libraries listed below
Sorting:
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆365Updated this week
- The driver for LMCache core to run in vLLM☆60Updated last year
- Fast and memory-efficient exact attention☆111Updated last week
- ☆47Updated last year
- KV cache store for distributed LLM inference☆390Updated 2 months ago
- LLM Serving Performance Evaluation Harness☆83Updated 11 months ago
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆90Updated last month
- A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from trainin…☆131Updated last month
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆220Updated this week
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆225Updated 3 weeks ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆120Updated last year
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆123Updated last month
- ☆48Updated last year
- Toolchain built around the Megatron-LM for Distributed Training☆84Updated 2 months ago
- PyTorch distributed training acceleration framework☆55Updated 5 months ago
- High-performance safetensors model loader☆94Updated 3 weeks ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated 4 months ago
- Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes☆146Updated 10 months ago
- ☆96Updated 10 months ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆92Updated 3 weeks ago
- A NVMf library for Java☆30Updated 6 years ago
- Stateful LLM Serving☆95Updated 10 months ago
- Accelerating MoE with IO and Tile-aware Optimizations☆569Updated 2 weeks ago
- A high-performance and light-weight router for vLLM large scale deployment☆101Updated this week
- Cute layout visualization☆29Updated 2 weeks ago
- ☆342Updated last week
- vLLM Router☆54Updated last year
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆263Updated this week
- ☆27Updated last year
- FlagCX is a scalable and adaptive cross-chip communication library.☆170Updated last week