A Learnable LSH Framework for Efficient NN Training
☆34Jul 22, 2021Updated 4 years ago
Alternatives and similar repositories for mongoose
Users that are interested in mongoose are comparing it to the libraries listed below
Sorting:
- Locality sensitive hash functions for Tensorflow 2.0.☆12Feb 18, 2022Updated 4 years ago
- ☆15Jan 7, 2022Updated 4 years ago
- ☆11Apr 3, 2023Updated 2 years ago
- FPGA-based HyperLogLog Accelerator☆12Jul 13, 2020Updated 5 years ago
- ☆11Jun 29, 2021Updated 4 years ago
- A compressed adaptive optimizer for training large-scale deep learning models using PyTorch☆25Nov 26, 2019Updated 6 years ago
- A2C training of Relational Deep Reinforcement Learning Architecture☆13Jun 22, 2022Updated 3 years ago
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆28Aug 19, 2025Updated 6 months ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 6 years ago
- Code repository for "Spatiotemporal Traffic Matrix Synthesis", Paul Tune and Matthew Roughan, ACM SIGCOMM 2015, London, UK, August 2015.☆15Jan 13, 2016Updated 10 years ago
- This is an official GitHub repository for the paper, "Towards timeout-less transport in commodity datacenter networks.".☆16Oct 12, 2021Updated 4 years ago
- SmartNIC☆14Dec 13, 2018Updated 7 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Nov 23, 2024Updated last year
- ☆19Sep 10, 2019Updated 6 years ago
- ☆21Mar 7, 2024Updated last year
- Johnson-Lindenstrauss transform (JLT), random projections (RP), fast Johnson-Lindenstrauss transform (FJLT), and randomized Hadamard tran…☆23Jul 11, 2023Updated 2 years ago
- GPTQ inference TVM kernel☆40Apr 25, 2024Updated last year
- Manages vllm-nccl dependency☆17Jun 3, 2024Updated last year
- SLIDE (Sub-LInear Deep learning Engine) written in Go☆45Apr 19, 2020Updated 5 years ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆17Mar 13, 2023Updated 2 years ago
- Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE☆19Sep 22, 2021Updated 4 years ago
- ☆45Apr 30, 2018Updated 7 years ago
- A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.☆43May 29, 2022Updated 3 years ago
- Compression schema for gradients of activations in backward pass☆45Jul 26, 2023Updated 2 years ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- [ICML 2024 Oral] LSH-Based Efficient Point Transformer (HEPT)☆24Jan 24, 2025Updated last year
- Network Traffic Transformer to learn network dynamics from packet traces. Learn fundamental dynamics with pre-training and fine-tune to m…☆23Jan 17, 2024Updated 2 years ago
- The NYU Systems Seminar☆23Feb 26, 2024Updated 2 years ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆57Jul 23, 2024Updated last year
- An FPGA integration and acceleration of the popular FAISS framework for approximate similarity search☆25Jul 20, 2019Updated 6 years ago
- ☆24May 6, 2022Updated 3 years ago
- ☆27Mar 2, 2023Updated 3 years ago
- benchmarking some transformer deployments☆26Dec 15, 2025Updated 2 months ago
- Code for our ACL '20 paper "Representation Engineering with Natural Language Explanations"☆29Jun 15, 2020Updated 5 years ago
- Fast matrix multiplication for few-bit integer matrices on CPUs.☆28Mar 19, 2019Updated 6 years ago
- Artifact evaluation repo for EuroSys'24.☆29Nov 7, 2023Updated 2 years ago
- Codebase for "SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems"☆1,104Apr 13, 2021Updated 4 years ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆33Nov 29, 2024Updated last year
- The prototype for NSDI paper "NetHint: White-Box Networking for Multi-Tenant Data Centers"☆26Feb 2, 2024Updated 2 years ago