Example of applying CUDA graphs to LLaMA-v2
☆11Aug 25, 2023Updated 2 years ago
Alternatives and similar repositories for llama-cuda-graph-example
Users that are interested in llama-cuda-graph-example are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- PostgreSQL BM25S extension☆135May 14, 2026Updated last week
- GPU operators for sparse tensor operations☆37Mar 11, 2024Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆82Jan 22, 2024Updated 2 years ago
- torch.compile artifacts for common deep learning models, can be used as a learning resource for torch.compile☆19Dec 22, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Pure Java Llama2 inference with optional multi-GPU CUDA implementation☆13Sep 2, 2023Updated 2 years ago
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- ☆14Jul 13, 2025Updated 10 months ago
- Frontend for v2.opyn.co☆11May 28, 2023Updated 2 years ago
- a fast implementation of BM25☆10Sep 15, 2022Updated 3 years ago
- A flexible Handlebars view engine for Express☆12Jul 6, 2016Updated 9 years ago
- Storytelling With Matplotlib (SWMat)☆13Jul 25, 2019Updated 6 years ago
- Factories over fixtures. Chai Assertion Library.☆23Nov 1, 2016Updated 9 years ago
- ☆25Sep 9, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Python Bash emulation for agents, a port of vercel-labs/just-bash☆50Feb 19, 2026Updated 3 months ago
- train with kittens!☆66Oct 25, 2024Updated last year
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 13 years ago
- The code of Advancing Expert Specialization for Better MoE (NeurIPS2025 oral)☆31Jan 22, 2026Updated 4 months ago
- This repository contains the results and code for the MLPerf™ Inference v4.0 benchmark.☆11Jul 24, 2025Updated 10 months ago
- A fork of the PEFT library, supporting Robust Adaptation (RoSA)☆15Aug 16, 2024Updated last year
- Service for estimating gas on a series of dependent transactions☆19Jun 16, 2023Updated 2 years ago
- A pytorch implementation of focal loss☆10Jan 9, 2020Updated 6 years ago
- ☆12Mar 31, 2021Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- IPFS-related scripts and utilities☆15Sep 23, 2021Updated 4 years ago
- ☆49Apr 15, 2024Updated 2 years ago
- [IEEE CAL 2025] Accelerating Page Migrations in Operating Systems with Intel DSA☆16Nov 20, 2024Updated last year
- ☆12Jun 3, 2019Updated 6 years ago
- Generative Agents: Interactive Simulacra of Human Behavior - with Local LLMs☆26Aug 15, 2023Updated 2 years ago
- Accelerating GPU Data Processing using FastLanes Compression☆19May 9, 2024Updated 2 years ago
- Inference Llama/Llama2/Llama3 Modes in NumPy☆20Nov 22, 2023Updated 2 years ago
- ☆13Jan 7, 2025Updated last year
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Nov 11, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Feb 20, 2026Updated 3 months ago
- UVA command line client to upload solutions and search for statistics☆10Dec 23, 2016Updated 9 years ago
- Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning☆57Mar 26, 2024Updated 2 years ago
- ☆14May 25, 2023Updated 3 years ago
- CUDA C simple application for Nvidia's GPU☆11Jun 7, 2022Updated 3 years ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Jun 21, 2023Updated 2 years ago
- A simple tool for parsing the profile.json file of mxnet☆14Aug 1, 2018Updated 7 years ago