High-performance CUDA kernels for real-time financial low latency inference, optimized for both consumer and datacenter GPUs.
☆20Jul 25, 2025Updated 7 months ago
Alternatives and similar repositories for cuda_latency_benchmark
Users that are interested in cuda_latency_benchmark are comparing it to the libraries listed below
Sorting:
- A powerful Laravel storage driver that enables seamless synchronization of files across multiple disks, with an integrated cache disk for…☆15Nov 11, 2025Updated 3 months ago
- 🐆A lightweight, high-performance string manipulation library optimized for speed-sensitive applications.☆14Jan 6, 2026Updated 2 months ago
- The SEAL-CPU backend is a Reference backend engine for HEBench which is a shared library that implements the required functions specified…☆11Mar 3, 2023Updated 3 years ago
- A batched implementation for efficient Qwen2.5-VL inference.☆22Jul 16, 2025Updated 7 months ago
- A SystemVerilog-based simulation and design of a Last Level Cache (LLC) implementing the MESI protocol, featuring Pseudo-LRU replacement,…☆15Nov 24, 2025Updated 3 months ago
- Reference implementation of Thin and Deep Gaussian Processes (NeurIPS 2023)☆14Nov 25, 2024Updated last year
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- An experimental python library to compile and analyze the cost of any desired composite simulation in real or imaginary time, and with or…☆10Feb 9, 2024Updated 2 years ago
- Sparse Matrix Factorization (SMF) is a key component in many machine learning problems and there exist a verity a applications in real-w…☆11Jan 25, 2016Updated 10 years ago
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- my profile readme☆14Updated this week
- Incremental optimizations to the N-Body problem in order to evaluate and compare the performance of Python translators in the HPC environ…☆13Apr 2, 2023Updated 2 years ago
- Minimal Transformer base in JAX. A single backbone for language modelling, diffusion, classification, etc...☆14May 28, 2025Updated 9 months ago
- Unofficial implementation of "Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle"☆13Jul 3, 2024Updated last year
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 6 months ago
- 💾 Optimize Laravel caching with Cachetastic! Cache method results, force refresh, handle errors, and boost app performance effortlessly.☆13Jan 26, 2026Updated last month
- A computationally efficient and robust LiDAR-inertial odometry (LIO) package☆13Aug 4, 2025Updated 7 months ago
- A Java-based framework for combinatorial test input generation, fault characterization and automated test execution.☆11Jan 22, 2024Updated 2 years ago
- Backpack Attachments is a FiveM resource for attaching weapons and items to players' backs. It supports customizable attachment points, h…☆10Nov 14, 2024Updated last year
- Optimizing loading training data from cloud bucket storage for cloud-based distributed deep learning. Official repository for Quantifying…☆11Jan 1, 2022Updated 4 years ago
- A magisk module that optimizes your device's memory performance through persistent zRAM + Swapfile optimization with VM tweaks.☆11Jun 1, 2025Updated 9 months ago
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- Discover Netflix's Open Connect Appliance (OCA) assigned to your connection. This tool fetches and displays detailed connectivity and hos…☆18Jul 22, 2025Updated 7 months ago
- Just simple JavaScript framework. Provides support for manipulating with DOM and events handling. Easy for use, optimized for performance…☆11Feb 15, 2017Updated 9 years ago
- The course work repo for UoSurrey EEEM071 (2023 Spring)☆11May 9, 2023Updated 2 years ago
- This project involves using Simulink to create a comprehensive battery model that incorporates both charging and discharging resistances.…☆11Apr 30, 2023Updated 2 years ago
- ☆15Mar 13, 2025Updated 11 months ago
- ☆14May 21, 2024Updated last year
- The High Performance Collision Cross Section (HPCCS) is a new software for fast and accurate calculation of CCS for molecular ions. Based…☆17May 11, 2020Updated 5 years ago
- Tlama (124M) is a language model based on LlaMa3 (127M) optimized by EigenCore. It is designed for computational efficiency and scalabili…☆12Mar 27, 2025Updated 11 months ago
- ☆10Nov 22, 2022Updated 3 years ago
- ☆15Aug 19, 2025Updated 6 months ago
- 1st Place Team Crane: @aswinkumar1999 @rathull @kyolebu☆29Sep 8, 2025Updated 5 months ago
- optimize TCP settings and download speeds of applications on Windows systems for improved network performance☆11Apr 29, 2025Updated 10 months ago
- Adds a Doctrine Id generator which uses an ordered UUID in MySQL for extra performance. Uses methods described in Karhik Appigatla's arti…☆10Jun 8, 2015Updated 10 years ago
- End to End Machine Learning Pipeline with scikit learn☆12Mar 10, 2021Updated 4 years ago
- Try to achieve ‘ Automatic Panoramic Image Stitching using Invariant Features’☆12Jun 8, 2019Updated 6 years ago
- High-performance technical indicators library for financial analysis, optimized with Numba☆13Oct 13, 2025Updated 4 months ago
- AI Agents using Crew AI☆12Jun 16, 2024Updated last year