High-performance CUDA kernels for real-time financial low latency inference, optimized for both consumer and datacenter GPUs.
☆20Jul 25, 2025Updated 8 months ago
Alternatives and similar repositories for cuda_latency_benchmark
Users that are interested in cuda_latency_benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Sparse Matrix Factorization (SMF) is a key component in many machine learning problems and there exist a verity a applications in real-w…☆12Jan 25, 2016Updated 10 years ago
- Optimizing loading training data from cloud bucket storage for cloud-based distributed deep learning. Official repository for Quantifying…☆11Jan 1, 2022Updated 4 years ago
- An implementation of the Pregel graph processing system on the Spark cluster computing framework. Merged into Spark; please see:☆11Apr 9, 2011Updated 15 years ago
- A batched implementation for efficient Qwen2.5-VL inference.☆24Jul 16, 2025Updated 9 months ago
- Optimize the performance of important tasks by delaying background-tasks☆22Mar 13, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Similar to fail2ban but optimized for performance and simplicity - programmed in Rust.☆18Aug 31, 2022Updated 3 years ago
- Implementation of the SHA-3 family using AVX/AVX2 instructions.☆14Oct 5, 2018Updated 7 years ago
- Membrane-based dehumidification is currently being considered as a promising solution for the building application due to its low cost an…☆10Oct 28, 2020Updated 5 years ago
- Drop-in-place Memory and Performance optimizations for LINQ☆24Jan 1, 2021Updated 5 years ago
- A computationally efficient and robust LiDAR-inertial odometry (LIO) package☆13Aug 4, 2025Updated 8 months ago
- Unit benchmarks of CUDA event APIs.☆17Apr 23, 2024Updated last year
- Official repository for the ICCV 2021 (Oral) paper "(Just) A Spoonful of Refinements Helps the Registration Error Go Down"☆11Dec 21, 2021Updated 4 years ago
- ☆13Oct 5, 2025Updated 6 months ago
- The Unity Mali Compiler Integration Tool is a Unity editor extension for direct use of ARM Mali Offline Compiler in shader performance an…☆32Sep 12, 2025Updated 7 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- bwa-aln-xeon-phi optimizes bwa aln performance on both Xeon and Xeon Phi platform, and support symmetric running model on Xeon and Xeon P…☆17Sep 4, 2014Updated 11 years ago
- Minimal Transformer base in JAX. A single backbone for language modelling, diffusion, classification, etc...☆15May 28, 2025Updated 10 months ago
- Different implementation of sparse matrix multiplication. All matrices are in CSR format. The code contains different CUDA kernels for mu…☆17Nov 15, 2010Updated 15 years ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- Build an AI bot in Discord to serve user's personalized reports on what's up in tech☆28Sep 14, 2025Updated 7 months ago
- Python package for compressing floating-point PyTorch tensors☆13Jul 22, 2024Updated last year
- ☆33Nov 10, 2025Updated 5 months ago
- Multi-modal Bayesian embedding model☆18Jun 30, 2016Updated 9 years ago
- Sparse kernels for GNNs based on TVM☆17Nov 18, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆17Jan 19, 2025Updated last year
- ☆22Dec 15, 2023Updated 2 years ago
- Latent Large Language Models☆19Aug 24, 2024Updated last year
- Nachos OS, KV store, distributed KV Store☆20May 13, 2013Updated 12 years ago
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- PolyU-BPCoMa: A Dataset and Benchmark Towards Mobile Colorized Mapping Using a Backpack Multisensorial System☆16May 3, 2022Updated 3 years ago
- Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders☆18May 23, 2025Updated 10 months ago
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 8 months ago
- The solution to Nachos which is a course project for Operating System at SJTU.☆18May 26, 2011Updated 14 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- GeoT: Tensor Centric Library for Graph Neural Network via Efficient Segment Reduction on GPU☆24Mar 27, 2025Updated last year
- ☆16Nov 23, 2023Updated 2 years ago
- A tutorial on 3D point set registration using SVD based approaches. We also investigate further objectives such as min. no. of point corr…☆13Aug 22, 2024Updated last year
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- Predicting the Stock Market - Can we do it?☆10Jul 24, 2021Updated 4 years ago
- Implementation of Spectral State Space Models☆16Feb 23, 2024Updated 2 years ago
- DigiForests Dataset Development Kit☆26Mar 16, 2026Updated last month