Example of applying CUDA graphs to LLaMA-v2
☆11Aug 25, 2023Updated 2 years ago
Alternatives and similar repositories for llama-cuda-graph-example
Users that are interested in llama-cuda-graph-example are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- GPU operators for sparse tensor operations☆36Mar 11, 2024Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆82Jan 22, 2024Updated 2 years ago
- torch.compile artifacts for common deep learning models, can be used as a learning resource for torch.compile☆19Dec 22, 2023Updated 2 years ago
- Pure Java Llama2 inference with optional multi-GPU CUDA implementation☆13Sep 2, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- ☆14Jul 13, 2025Updated 9 months ago
- Frontend for v2.opyn.co☆10May 28, 2023Updated 2 years ago
- a fast implementation of BM25☆10Sep 15, 2022Updated 3 years ago
- A flexible Handlebars view engine for Express☆12Jul 6, 2016Updated 9 years ago
- Storytelling With Matplotlib (SWMat)☆13Jul 25, 2019Updated 6 years ago
- Factories over fixtures. Chai Assertion Library.☆23Nov 1, 2016Updated 9 years ago
- ☆25Sep 9, 2024Updated last year
- train with kittens!☆64Oct 25, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 13 years ago
- This repository contains the results and code for the MLPerf™ Inference v4.0 benchmark.☆11Jul 24, 2025Updated 9 months ago
- A fork of the PEFT library, supporting Robust Adaptation (RoSA)☆15Aug 16, 2024Updated last year
- Service for estimating gas on a series of dependent transactions☆19Jun 16, 2023Updated 2 years ago
- A pytorch implementation of focal loss☆10Jan 9, 2020Updated 6 years ago
- ☆12Mar 31, 2021Updated 5 years ago
- ☆49Apr 15, 2024Updated 2 years ago
- IPFS-related scripts and utilities☆15Sep 23, 2021Updated 4 years ago
- [IEEE CAL 2025] Accelerating Page Migrations in Operating Systems with Intel DSA☆16Nov 20, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆12Jun 3, 2019Updated 6 years ago
- Sardeenz is a proof-of-concept application that allows you to load more than one model on a given GPU. It allows you to add more and more…☆51Mar 27, 2026Updated last month
- Generative Agents: Interactive Simulacra of Human Behavior - with Local LLMs☆22Aug 15, 2023Updated 2 years ago
- Accelerating GPU Data Processing using FastLanes Compression☆19May 9, 2024Updated last year
- Inference Llama/Llama2/Llama3 Modes in NumPy☆21Nov 22, 2023Updated 2 years ago
- ☆13Jan 7, 2025Updated last year
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…