High-performance CUDA kernels for real-time financial low latency inference, optimized for both consumer and datacenter GPUs.
☆19Jul 25, 2025Updated 10 months ago
Alternatives and similar repositories for cuda_latency_benchmark
Users that are interested in cuda_latency_benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Triton Compiler related materials.☆44Mar 16, 2026Updated 3 months ago
- Keyboard-first dotfiles for terminal-centric development with tmux, Neovim, and coding agents.☆28Updated this week
- Backpack Attachments is a FiveM resource for attaching weapons and items to players' backs. It supports customizable attachment points, h…☆10Nov 14, 2024Updated last year
- Demo repository for article "Express server, Handlebars & Critical Path Performance Optimization"☆13Jan 12, 2017Updated 9 years ago
- A Particle System implemented in android, handling collinsions, optimized for performance☆10Dec 18, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A lightweight React hook that automatically manages fade overlays for scrollable containers. Provides smooth gradient transitions at the …☆12Aug 11, 2025Updated 10 months ago
- Discover Netflix's Open Connect Appliance (OCA) assigned to your connection. This tool fetches and displays detailed connectivity and hos…☆19Jul 22, 2025Updated 10 months ago
- Portfolio and blog for my online brand. Performance Optimized Single Page React App.☆11Nov 7, 2016Updated 9 years ago
- Principles and Methodologies for Serial Performance Optimization (OSDI' 25)☆29Jun 5, 2025Updated last year
- An implementation of the Pregel graph processing system on the Spark cluster computing framework. Merged into Spark; please see:☆11Apr 9, 2011Updated 15 years ago
- Small, performance optimized module for parallax image backgrounds 🖼☆10Feb 19, 2017Updated 9 years ago
- Tomasulo Simulator written in React as the project for Computer Architecture course, Spring 2019, Tsinghua University☆12Jun 9, 2019Updated 7 years ago
- r3conwhale aims to develop a multifunctional recon chain for web applications, intelligently interpreting collected data, and optimizing …☆14Jul 3, 2024Updated last year
- Optimize the performance of important tasks by delaying background-tasks☆22Mar 13, 2026Updated 3 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- The High Performance Collision Cross Section (HPCCS) is a new software for fast and accurate calculation of CCS for molecular ions. Based…☆17May 11, 2020Updated 6 years ago
- PowerShell script to optimize Windows performance and reduce latency (24H2 compatible) for a better Data Science experience.☆21Jul 30, 2025Updated 10 months ago
- ☆10Nov 22, 2022Updated 3 years ago
- machine learning model performance metrics & charts with confidence intervals, optimized with numba to be fast☆16Dec 15, 2021Updated 4 years ago
- This project is a real-time Wav2Lip implementation that I am actively optimizing to enhance the precision and performance of audio-to-lip…☆11Dec 6, 2023Updated 2 years ago
- ☆11Apr 2, 2021Updated 5 years ago
- This workshop teaches systematic approaches to evaluating Generative AI workloads for production use. You'll learn to build evaluation fr…☆48Jun 2, 2026Updated last week
- MySQL Memory Calculator estimates maximum MySQL memory usage based on key configuration settings. Featuring real-time calculations and vi…☆18Dec 17, 2024Updated last year
- Implementation of the SHA-3 family using AVX/AVX2 instructions.☆14Oct 5, 2018Updated 7 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The optimization field suffers from the metaphor-based “pseudo-novel” or “fancy” optimizers. Most of these cliché methods mimic animals' …☆12Oct 13, 2024Updated last year
- Efficient 3bit/4bit quantization of LLaMA models☆18May 18, 2023Updated 3 years ago
- Fork of kingoflolz/mesh-transformer-jax with memory usage optimizations and support for GPT-Neo, GPT-NeoX, BLOOM, OPT and fairseq dense L…☆22Nov 14, 2022Updated 3 years ago
- Demo theme with various front-end performance optimization tricks applied☆15Sep 18, 2017Updated 8 years ago
- Repo for Li, Kafka, Gao et al 2019 "Clustering discretization methods for generation of material performance databases in machine learni…☆14May 27, 2019Updated 7 years ago
- ☆17Mar 11, 2021Updated 5 years ago
- JetBrains WebStorm vmoptions optimized for performance☆12Mar 4, 2015Updated 11 years ago
- 📊 Research-focused SDXL training framework exploring novel optimization approaches. Goals include enhanced image quality, training stabi…☆21Jun 7, 2025Updated last year
- A magisk module that optimizes your device's memory performance through persistent zRAM + Swapfile optimization with VM tweaks.☆17Jun 1, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Membrane-based dehumidification is currently being considered as a promising solution for the building application due to its low cost an…☆10Oct 28, 2020Updated 5 years ago
- A high-performance image processing library designed to optimize and extend the Albumentations library with specialized functions for adv…☆119May 30, 2026Updated 2 weeks ago
- Tlama (124M) is a language model based on LlaMa3 (127M) optimized by EigenCore. It is designed for computational efficiency and scalabili…☆20May 16, 2026Updated last month
- Incremental optimizations to the N-Body problem in order to evaluate and compare the performance of Python translators in the HPC environ…☆13Apr 2, 2023Updated 3 years ago
- How would you predict the compressive strength of concrete as a function of its constituent materials and curing time? In this portfolio …☆13Nov 27, 2020Updated 5 years ago
- Unit benchmarks of CUDA event APIs.☆17Apr 23, 2024Updated 2 years ago
- Machine learning can enhance a BMS by improving SOC and SOH estimation, detecting faults, and optimizing control policies. By leveraging …☆13Feb 16, 2024Updated 2 years ago