☆49Mar 14, 2025Updated last year
Alternatives and similar repositories for cse234-w25-PA
Users that are interested in cse234-w25-PA are comparing it to the libraries listed below
Sorting:
- Website for CSE 234, Winter 2025☆13Mar 24, 2025Updated 11 months ago
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆29Jan 22, 2026Updated last month
- Code for "What really matters in matrix-whitening optimizers?"☆23Oct 31, 2025Updated 4 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆131Jun 24, 2025Updated 8 months ago
- Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…☆21Mar 7, 2024Updated 2 years ago
- Expert Specialization MoE Solution based on CUTLASS☆27Jan 19, 2026Updated 2 months ago
- ECE408 (Applied Parallel Programming) Fall 2022 MP☆20Mar 24, 2023Updated 2 years ago
- ☆13Jan 7, 2025Updated last year
- The official implementation for the intra-stage fusion technique introduced in https://arxiv.org/abs/2409.13221☆31Apr 22, 2025Updated 10 months ago
- Contains all relevant scripts & tools developed to assist in conducting the first iteration of the Indian Inter-college Competitive Progr…☆10Sep 14, 2024Updated last year
- A repository where I store all my competitive programming related code☆14Feb 26, 2026Updated 3 weeks ago
- C++ library for finding Strongly Connected Components in parallel, based on paper: https://dl.acm.org/citation.cfm?id=2851161☆12May 22, 2018Updated 7 years ago
- ☆19Dec 24, 2024Updated last year
- Sample Codes using NVSHMEM on Multi-GPU☆30Jan 22, 2023Updated 3 years ago
- Asynchronous pipeline parallel optimization☆19Feb 2, 2026Updated last month
- Awesome code, projects, books, etc. related to CUDA☆30Feb 3, 2026Updated last month
- ☆107Feb 25, 2025Updated last year
- This is the official implementation for paper "On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond".☆20Nov 17, 2025Updated 4 months ago
- [NeurIPS 23] Characterizing OOD Error via Optimal Transport☆13Nov 19, 2023Updated 2 years ago
- DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing (WACV 2025)☆12Feb 7, 2026Updated last month
- TopoTrans: Optimal Transport meets Topological Data Analysis☆14Apr 20, 2023Updated 2 years ago
- Code and results accompanying our paper titled Leveraging Unlabeled Data to Predict Out-of-Distribution Performance at ICLR 2022☆10Dec 8, 2022Updated 3 years ago
- Code for Tangent Model Composition for Ensembling and Continual Fine-tuning (ICCV 2023) and Tangent Transformers for Composition, Privacy…☆13May 14, 2024Updated last year
- Stick-breaking attention☆62Jul 1, 2025Updated 8 months ago
- ☆17Jun 4, 2025Updated 9 months ago
- Utility to use eleven lab's streaming to in the command line☆11Aug 8, 2023Updated 2 years ago
- 🫧 Code for Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data (Maekawa*, Iso* et al.…☆12Feb 25, 2025Updated last year
- Tigon: A Distributed Database for a CXL Pod [OSDI '25]☆45Nov 25, 2025Updated 3 months ago
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆141Feb 25, 2026Updated 3 weeks ago
- ☆78Nov 26, 2024Updated last year
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- Game engine for website version avalon card-board game☆12Aug 2, 2025Updated 7 months ago
- Exploitability calculation for imperfect-information game benchmarks☆33Apr 5, 2025Updated 11 months ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆21Updated this week
- Calculate PI using MPI with 3 different methods☆19Feb 9, 2015Updated 11 years ago
- ☆15Mar 2, 2025Updated last year
- ☆10Nov 18, 2024Updated last year
- qwen-nsa☆87Oct 14, 2025Updated 5 months ago
- Implementation of "Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions"☆24Aug 27, 2024Updated last year