Unofficial Scalable-Softmax Is Superior for Attention
☆20May 30, 2025Updated 10 months ago
Alternatives and similar repositories for Scalable-Softmax
Users that are interested in Scalable-Softmax are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆15Oct 3, 2024Updated last year
- Implementations of the XNOR networks☆12Aug 9, 2017Updated 8 years ago
- ☆44Feb 11, 2026Updated last month
- ☆16Dec 18, 2025Updated 3 months ago
- PyTorch implementation of "Deep Transferring Quantization" (ECCV2020)☆18Jun 22, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Source code of the paper: Overlapped Trajectory-Enhanced Visual Tracking☆11Sep 3, 2024Updated last year
- The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization☆18Mar 7, 2025Updated last year
- Revisiting Parameter Sharing for Automatic Neural Channel Number Search, NeurIPS 2020☆21Nov 15, 2020Updated 5 years ago
- Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"☆28Aug 21, 2024Updated last year
- ☆14Jul 13, 2025Updated 8 months ago
- Official Pytorch implementation of Chromatic Graph Transformers☆10Jun 14, 2023Updated 2 years ago
- Clustered Compositional Embeddings☆12Oct 25, 2023Updated 2 years ago
- Robust Tracking via Mamba-based Context-aware Token Learning (AAAI 2025)☆16Nov 6, 2025Updated 5 months ago
- Ἀνατομή is a PyTorch library to analyze representation of neural networks☆13Jan 31, 2024Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- Don't just regulate gradients like in Muon, regulate the weights too☆32Jul 30, 2025Updated 8 months ago
- Implementation of Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems☆14Nov 11, 2023Updated 2 years ago
- A Zen approach to configuring your Python project☆17Feb 27, 2026Updated last month
- ☆12Sep 16, 2024Updated last year
- Codes for Accepted Paper : "MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization" in NeurIPS 2019☆54May 8, 2020Updated 5 years ago
- ✨🌲 Hierarchical extreme multiclass and multi-label classification.☆18Jan 5, 2023Updated 3 years ago
- ☆19Nov 6, 2023Updated 2 years ago
- ☆14May 19, 2024Updated last year
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- A token pruning method that accelerates ViTs for various tasks while maintaining high performance.☆27Jul 21, 2025Updated 8 months ago
- Code of paper 'Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training'☆21Jun 10, 2025Updated 10 months ago
- [TCSVT2025] MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking☆22Apr 6, 2025Updated last year
- 4th place solution to datafactory challenge by Intermarché.☆12Jun 28, 2021Updated 4 years ago
- ☆12Jan 17, 2024Updated 2 years ago
- Find context neurons in Pythia models.☆13Jun 13, 2023Updated 2 years ago
- Code for experiments on transformers using Markovian data.☆22Nov 22, 2024Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆31Mar 12, 2024Updated 2 years ago
- Personal solutions to the Triton Puzzles☆20Jul 18, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Least Squares Regression for subspace clustering☆10May 27, 2018Updated 7 years ago
- ☆12Mar 19, 2021Updated 5 years ago
- [CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression☆54Feb 25, 2026Updated last month
- Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops☆30Mar 16, 2024Updated 2 years ago
- torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.☆25Mar 29, 2024Updated 2 years ago
- Pointax: PointMaze Environment for JAX☆27Oct 22, 2025Updated 5 months ago
- Collection of curriculum and useful examples for robotics and autonomous systems education using MATLAB® and Simulink® for different stag…☆42Sep 23, 2025Updated 6 months ago