A bunch of kernels that might make stuff slower π
β75Feb 18, 2026Updated 2 weeks ago
Alternatives and similar repositories for accelerated-model-architectures
Users that are interested in accelerated-model-architectures are comparing it to the libraries listed below
Sorting:
- Transformers components but in Tritonβ34May 9, 2025Updated 9 months ago
- Pytorch routines for (Ker)nel (Mac)hinesβ10Oct 10, 2025Updated 4 months ago
- β12Jan 29, 2021Updated 5 years ago
- Awesome Triton Resourcesβ39Apr 27, 2025Updated 10 months ago
- β18Nov 11, 2025Updated 3 months ago
- β53Feb 24, 2026Updated last week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.β106Jun 28, 2025Updated 8 months ago
- FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.β54Feb 6, 2026Updated 3 weeks ago
- A Triton-only attention backend for vLLMβ24Feb 11, 2026Updated 3 weeks ago
- β16May 14, 2024Updated last year
- Variable-order CRFs with structure learningβ17Aug 1, 2024Updated last year
- β262Jul 11, 2024Updated last year
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling withoutβ¦β21Mar 15, 2025Updated 11 months ago
- Framework to reduce autotune overhead to zero for well known deployments.β97Sep 19, 2025Updated 5 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.β327Updated this week
- A Quirky Assortment of CuTe Kernelsβ838Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.β128Jul 13, 2024Updated last year
- Official Project Page for HLA: Higher-order Linear Attention (https://arxiv.org/abs/2510.27258)β45Jan 6, 2026Updated last month
- β87Updated this week
- Official Repository for Efficient Linear-Time Attention Transformers.β18Jun 2, 2024Updated last year
- Code for the paper: https://arxiv.org/pdf/2309.06979.pdfβ21Jul 29, 2024Updated last year
- Automatic differentiation for Triton Kernelsβ29Aug 12, 2025Updated 6 months ago
- Statistical discontinuous constituent parsingβ11Feb 15, 2018Updated 8 years ago
- β20May 24, 2025Updated 9 months ago
- β14May 14, 2019Updated 6 years ago
- β105Nov 7, 2024Updated last year
- RADLADS training codeβ37May 7, 2025Updated 9 months ago
- DeeperGEMM: crazy optimized versionβ74May 5, 2025Updated 9 months ago
- This repository is the official implementation of our EMNLP 2022 paper ELMER: A Non-Autoregressive Pre-trained Language Model for Efficieβ¦β26Oct 27, 2022Updated 3 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based β¦β11Mar 18, 2023Updated 2 years ago
- source code for NAACL2022 main conference "Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs"β10Sep 26, 2022Updated 3 years ago
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.β29Feb 22, 2026Updated last week
- Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span β¦β14Aug 25, 2023Updated 2 years ago
- β11Oct 13, 2019Updated 6 years ago
- PyTorch implementation for PaLM: A Hybrid Parser and Language Model.β10Jan 7, 2020Updated 6 years ago
- Advanced Formal Language Theory (263-5352-00L; FrΓΌhjahr 2023)β10Feb 21, 2023Updated 3 years ago
- Manipulate tensors with PackedSequence and CattedSequenceβ12Jan 4, 2026Updated 2 months ago
- Repository for SPECTRA: Sparse Structured Text Rationalization, accepted at EMNLP 2021 main conference.β10Feb 14, 2024Updated 2 years ago
- CPU and GPU tutorial examplesβ13Apr 4, 2025Updated 11 months ago