[ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
☆36Nov 4, 2025Updated 6 months ago
Alternatives and similar repositories for Outlier-Safe-Pre-Training
Users that are interested in Outlier-Safe-Pre-Training are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NAACL 2025] ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage☆16Sep 2, 2025Updated 8 months ago
- ☆36Mar 12, 2025Updated last year
- a libp2p-backed daemon wrapping the functionalities of go-libp2p for use in other languages☆11Feb 9, 2025Updated last year
- ☆15Mar 2, 2025Updated last year
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆75Aug 2, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 8 months ago
- ☆13Apr 27, 2026Updated last week
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- A fluent, scalable, and easy-to-use LLM data processing framework.☆28Jan 31, 2026Updated 3 months ago
- Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization (IEEE TPAMI 2021)☆17Jun 4, 2021Updated 4 years ago
- ☆16Sep 22, 2024Updated last year
- Code publication to the paper "Normalized Attention Without Probability Cage"☆17Nov 9, 2021Updated 4 years ago
- Semi-supervised spoken language understanding (SLU) via self-supervised speech and language model pretraining☆12Mar 23, 2021Updated 5 years ago
- sigma-MoE layer☆21Jan 5, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Memory-efficient transformer. Work in progress.☆19Sep 17, 2022Updated 3 years ago
- [ICML 2025] MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design☆28Jul 4, 2025Updated 10 months ago
- Generic build server☆65May 25, 2014Updated 11 years ago
- ☆14Oct 24, 2022Updated 3 years ago
- Fractional Spike Differential Equations Neural Network with Efficient Adjoint Parameters Training☆16Aug 6, 2025Updated 8 months ago
- ☆13Jul 4, 2020Updated 5 years ago
- ☆52Jan 28, 2024Updated 2 years ago
- An experiment to see if chatgpt can improve the output of the stanford alpaca dataset☆12Mar 29, 2023Updated 3 years ago
- A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…☆49Oct 21, 2025Updated 6 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆14Dec 28, 2021Updated 4 years ago
- A sample app to debug and validate cellular modems on balena devices☆13Jun 5, 2019Updated 6 years ago
- Code for the paper "Secure Distributed Training at Scale" (ICML 2022)☆16Feb 4, 2025Updated last year
- Supporting code for the blog post on modular manifolds.☆121Sep 26, 2025Updated 7 months ago
- ☆21Apr 16, 2024Updated 2 years ago
- React 0.13 with ES6, Immutable.js and Flux, Isomorphic as well☆11Mar 10, 2015Updated 11 years ago
- ☆11Jun 4, 2021Updated 4 years ago
- ☆15Dec 5, 2019Updated 6 years ago
- ☆14Jun 28, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [NeurIPS 2024] Physics-Informed Regularization for Domain-Agnostic Dynamical System Modeling☆26Jul 10, 2025Updated 9 months ago
- Caching for Graphql Resolvers☆19Nov 21, 2019Updated 6 years ago
- Load & manage evolving datasets efficiently☆23Aug 22, 2025Updated 8 months ago
- ☆33Apr 22, 2025Updated last year
- ☆49May 20, 2025Updated 11 months ago
- ☆11Feb 3, 2025Updated last year
- This Go package multiplexes streams over a single underlying transport io.ReadWriteCloser.☆25Mar 2, 2024Updated 2 years ago