bertmaher / llama2.soLinks

Inference Llama 2 with a model compiled to native code by TorchInductor

☆14

Alternatives and similar repositories for llama2.so

Users that are interested in llama2.so are comparing it to the libraries listed below

Sorting:

lianakoleva / no-libtorch-compile
☆21Updated 11 months ago
cchan / tccl
extensible collectives library in triton
☆95Updated 10 months ago
deepspeedai / DeepSpeed-Kernels
☆71Updated 10 months ago
apple / ml-recurrent-drafter
☆219Updated last year
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆219Updated last week
IST-DASLab / Quartet
☆118Updated last month
tridao / flash-attention-wheels
☆61Updated 2 years ago
UmerHA / triton_util
Make triton easier
☆50Updated last year
meta-pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆164Updated 3 weeks ago
meta-pytorch / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆194Updated this week
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆427Updated last week
ezyang / torchdbg
PyTorch centric eager mode debugger
☆48Updated last year
vllm-project / dashboard
vLLM performance dashboard
☆41Updated last year
gpu-mode / ring-attention
ring-attention experiments
☆165Updated last year
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆227Updated last year
gau-nernst / quantized-training
Explore training for quantized models
☆26Updated 6 months ago
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆315Updated 5 months ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated 2 months ago
facebookresearch / fastgen
Simple high-throughput inference library
☆155Updated 8 months ago
gpu-mode / kernelbot
Write a fast kernel and run it on Discord. See how you compare against the best!
☆71Updated this week
meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆324Updated this week
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆134Updated 2 weeks ago
meta-pytorch / kraken
Triton-based Symmetric Memory operators and examples
☆81Updated 3 weeks ago
open-lm-engine / accelerated-model-architectures
A bunch of kernels that might make stuff slower 😉
☆75Updated this week
AI-Hypercomputer / jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
☆79Updated last month
ROCm / aotriton
Ahead of Time (AOT) Triton Math Library
☆88Updated last week
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆238Updated this week
mlc-ai / llm-perf-bench
☆120Updated last year
NVlabs / vibetensor
Our first fully AI generated deep learning system
☆481Updated last week
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆93Updated last year