Example of applying CUDA graphs to LLaMA-v2
☆11Aug 25, 2023Updated 2 years ago
Alternatives and similar repositories for llama-cuda-graph-example
Users that are interested in llama-cuda-graph-example are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- GPU operators for sparse tensor operations☆36Mar 11, 2024Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆82Jan 22, 2024Updated 2 years ago
- torch.compile artifacts for common deep learning models, can be used as a learning resource for torch.compile☆19Dec 22, 2023Updated 2 years ago
- Pure Java Llama2 inference with optional multi-GPU CUDA implementation☆13Sep 2, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- ☆14Jul 13, 2025Updated 9 months ago
- Frontend for v2.opyn.co☆11May 28, 2023Updated 2 years ago
- a fast implementation of BM25☆10Sep 15, 2022Updated 3 years ago
- A flexible Handlebars view engine for Express☆12Jul 6, 2016Updated 9 years ago
- Storytelling With Matplotlib (SWMat)☆13Jul 25, 2019Updated 6 years ago
- Factories over fixtures. Chai Assertion Library.☆23Nov 1, 2016Updated 9 years ago
- ☆25Sep 9, 2024Updated last year
- train with kittens!☆64Oct 25, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 13 years ago
- This repository contains the results and code for the MLPerf™ Inference v4.0 benchmark.☆11Jul 24, 2025Updated 8 months ago
- A fork of the PEFT library, supporting Robust Adaptation (RoSA)☆15Aug 16, 2024Updated last year
- Service for estimating gas on a series of dependent transactions☆19Jun 16, 2023Updated 2 years ago
- A pytorch implementation of focal loss☆10Jan 9, 2020Updated 6 years ago
- ☆12Mar 31, 2021Updated 5 years ago
- ☆49Apr 15, 2024Updated 2 years ago
- IPFS-related scripts and utilities☆15Sep 23, 2021Updated 4 years ago
- [IEEE CAL 2025] Accelerating Page Migrations in Operating Systems with Intel DSA☆16Nov 20, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆13Jun 3, 2019Updated 6 years ago
- Sardeenz is a proof-of-concept application that allows you to load more than one model on a given GPU. It allows you to add more and more…☆50Mar 27, 2026Updated 2 weeks ago
- Generative Agents: Interactive Simulacra of Human Behavior - with Local LLMs☆21Aug 15, 2023Updated 2 years ago
- Accelerating GPU Data Processing using FastLanes Compression☆17May 9, 2024Updated last year
- ☆13Jan 7, 2025Updated last year
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Nov 11, 2024Updated last year
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Feb 20, 2026Updated last month
- Inference Llama/Llama2/Llama3 Modes in NumPy☆21Nov 22, 2023Updated 2 years ago
- UVA command line client to upload solutions and search for statistics☆10Dec 23, 2016Updated 9 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning☆57Mar 26, 2024Updated 2 years ago
- ☆13May 25, 2023Updated 2 years ago
- CUDA C simple application for Nvidia's GPU☆11Jun 7, 2022Updated 3 years ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Jun 21, 2023Updated 2 years ago
- A simple tool for parsing the profile.json file of mxnet☆14Aug 1, 2018Updated 7 years ago
- rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)☆12Nov 18, 2024Updated last year
- An ultra-fast, distributed Safetensors loader☆34Apr 8, 2026Updated last week