gdevos010/Scalable-Softmax

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/gdevos010/Scalable-Softmax)

gdevos010 / Scalable-Softmax

Unofficial Scalable-Softmax Is Superior for Attention

☆21

Alternatives and similar repositories for Scalable-Softmax

Users that are interested in Scalable-Softmax are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

benbergner / cropr
View on GitHub
A token pruning method that accelerates ViTs for various tasks while maintaining high performance.
☆29Jul 21, 2025Updated 11 months ago
BryceZhuo / HybridNorm
View on GitHub
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆19Mar 7, 2025Updated last year
joshyZhou / HINT
View on GitHub
Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration
☆30Aug 13, 2025Updated 11 months ago
apple / ml-reversal-blessing
View on GitHub
☆17Jul 31, 2025Updated 11 months ago
xiezheng-cs / DTQ
View on GitHub
PyTorch implementation of "Deep Transferring Quantization" (ECCV2020)
☆18Jun 22, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
fangyuan-ksgk / selective-attention-transformer
View on GitHub
Unofficial Implementation of Selective Attention Transformer
☆20Oct 31, 2024Updated last year
haolibai / APS-channel-search
View on GitHub
Revisiting Parameter Sharing for Automatic Neural Channel Number Search, NeurIPS 2020
☆21Nov 15, 2020Updated 5 years ago
mayank31398 / ladder-residual-inference
View on GitHub
☆14Jul 13, 2025Updated last year
thomasahle / cce
View on GitHub
Clustered Compositional Embeddings
☆13Oct 25, 2023Updated 2 years ago
inria-thoth / csa
View on GitHub
Official Pytorch implementation of Chromatic Graph Transformers
☆10Jun 14, 2023Updated 3 years ago
brando90 / ultimate-anatome
View on GitHub
Ἀνατομή is a PyTorch library to analyze representation of neural networks
☆13Jan 31, 2024Updated 2 years ago
GuoTianYu2000 / Active-Dormant-Attention
View on GitHub
codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"
☆11Dec 30, 2024Updated last year
datenlord / roce-sim
View on GitHub
☆12Dec 27, 2022Updated 3 years ago
ledmaster / unified-embeddings
View on GitHub
Implementation of Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems
☆15Nov 11, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Singularity0104 / NExT-Vid
View on GitHub
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
☆22Dec 24, 2025Updated 6 months ago
csyhhu / MetaQuant
View on GitHub
Codes for Accepted Paper : "MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization" in NeurIPS 2019
☆54May 8, 2020Updated 6 years ago
Twilight92z / Quantize-Watermark
View on GitHub
☆19Nov 6, 2023Updated 2 years ago
Mluckydwyer / hw-ci
View on GitHub
Hardware CD/CI and Development Containers 🚢
☆11Jul 20, 2022Updated 4 years ago
xunull / read-RT-DETR
View on GitHub
☆14May 19, 2024Updated 2 years ago
huangzizheng01 / ShuffleMamba
View on GitHub
Code of paper 'Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training'
☆21Jun 10, 2025Updated last year
Qualcomm-AI-research / BayesianBits
View on GitHub
☆22Feb 11, 2022Updated 4 years ago
Bond1995 / Markov
View on GitHub
Code for experiments on transformers using Markovian data.
☆22Nov 22, 2024Updated last year
ThisisBillhe / torch_quantizer
View on GitHub
torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.
☆25Mar 29, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
wesg52 / llm-context-neurons
View on GitHub
Find context neurons in Pythia models.
☆13Jun 13, 2023Updated 3 years ago
LingyiHongfd / CompressTracker
View on GitHub
☆22Nov 16, 2025Updated 8 months ago
aredden / torch-bnb-fp4
View on GitHub
Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops
☆30Mar 16, 2024Updated 2 years ago
james-oldfield / MxD
View on GitHub
[NeurIPS'25] Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
☆16May 28, 2025Updated last year
canyilu / Least-Squares-Regression-for-subspace-clustering
View on GitHub
Least Squares Regression for subspace clustering
☆11May 27, 2018Updated 8 years ago
borjanG / 2023-transformers
View on GitHub
Codes for the paper The emergence of clusters in self-attention dynamics.
☆17Dec 18, 2023Updated 2 years ago
a-rahimi / hessian
View on GitHub
The Hessian of tall-skinny networks is easy to invert
☆17Updated this week
luinaudt / deparser
View on GitHub
☆16Dec 16, 2021Updated 4 years ago
alexzhang13 / Triton-Puzzles-Solutions
View on GitHub
Personal solutions to the Triton Puzzles
☆22Jul 18, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
MingSun-Tse / Why-the-State-of-Pruning-so-Confusing
View on GitHub
[Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…
☆41Sep 9, 2025Updated 10 months ago
facebookresearch / Ternary_Binary_Transformer
View on GitHub
ACL 2023
☆39Jun 6, 2023Updated 3 years ago
GXNU-ZhongLab / MambaLCT
View on GitHub
☆25Dec 20, 2024Updated last year
johnmarktaylor91 / pytorch_feature_analysis
View on GitHub
☆12Mar 19, 2021Updated 5 years ago
HKUST-SING / srnic-simulation-public
View on GitHub
☆22Apr 2, 2023Updated 3 years ago
shikaiqiu / compute-better-spent
View on GitHub
☆63Oct 3, 2024Updated last year
mithro / docker-xilinx
View on GitHub
Package Xilinx FPGA tools into docker containers, useful for CI situations.
☆17Oct 20, 2014Updated 11 years ago