The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆30Nov 12, 2024Updated last year
Alternatives and similar repositories for SparsingLaw
Users that are interested in SparsingLaw are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆19Jun 4, 2025Updated 9 months ago
- ☆17Jun 11, 2025Updated 9 months ago
- Official Implementation of wd1☆24Sep 25, 2025Updated 6 months ago
- Fork of Flame repo for training of some new stuff in development☆19Mar 17, 2026Updated last week
- ☆11Sep 8, 2023Updated 2 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- ☆12Jun 11, 2025Updated 9 months ago
- TMMA: A Tiled Matrix Multiplication Accelerator for Self-Attention Projections in Transformer Models, optimized for edge deployment on Xi…☆27Mar 24, 2025Updated last year
- [ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers☆26Jun 7, 2023Updated 2 years ago
- [NeurIPS 2025] Reasoning Models Better Express Their Confidence"☆22Nov 19, 2025Updated 4 months ago
- ☆30Oct 22, 2025Updated 5 months ago
- ☆18Apr 16, 2025Updated 11 months ago
- To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models☆33May 21, 2025Updated 10 months ago
- ☆21Oct 22, 2025Updated 5 months ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆11Sep 7, 2024Updated last year
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Jun 11, 2025Updated 9 months ago
- ☆41Oct 12, 2025Updated 5 months ago
- The official implementation of the paper "Data Contamination Calibration for Black-box LLMs" (ACL 2024)☆16May 21, 2024Updated last year
- Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders☆18May 23, 2025Updated 10 months ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated 11 months ago
- The implement of paper:"ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability"☆64Jun 3, 2025Updated 9 months ago
- [ACL 2025] The official code for "AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection".☆37Aug 4, 2025Updated 7 months ago
- ☆15Mar 20, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Code for the paper "Firewalls to Secure Dynamic LLM Agentic Networks"☆29Jun 6, 2025Updated 9 months ago
- [ACL 2025 Findings] Implicit Reasoning in Transformers is Reasoning through Shortcuts☆17Mar 11, 2025Updated last year
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆29Jul 24, 2025Updated 8 months ago
- Code for the paper "Cottention: Linear Transformers With Cosine Attention"☆20Nov 15, 2025Updated 4 months ago
- Sparse Backpropagation for Mixture-of-Expert Training☆29Jul 2, 2024Updated last year
- Unofficial implementation of the Ask-LLM paper 'How to Train Data-Efficient LLMs', arXiv:2402.09668.☆12Jun 19, 2024Updated last year
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆18Nov 4, 2025Updated 4 months ago
- ☆33Nov 11, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [NeurIPS 2024] VeLoRA : Memory Efficient Training using Rank-1 Sub-Token Projections☆21Oct 15, 2024Updated last year
- ☆21Jul 21, 2025Updated 8 months ago
- ☆16Jul 23, 2024Updated last year
- ☆55May 22, 2025Updated 10 months ago
- ☆29Jun 5, 2025Updated 9 months ago
- ☆125Feb 4, 2026Updated last month
- Official Implementation of APB (ACL 2025 main Oral) and Spava.☆35Jan 30, 2026Updated last month