apd10 / universal_memory_allocationLinks
☆15Updated 3 years ago
Alternatives and similar repositories for universal_memory_allocation
Users that are interested in universal_memory_allocation are comparing it to the libraries listed below
Sorting:
- ☆14Updated 3 years ago
- [ NeurIPS '22 ] Data distillation for recommender systems. Shows equivalent performance with 2-3 orders less data.☆23Updated 2 years ago
- A study of the downstream instability of word embeddings☆12Updated 2 years ago
- A Learnable LSH Framework for Efficient NN Training☆31Updated 3 years ago
- ☆14Updated 3 years ago
- Differentiable Product Quantization for End-to-End Embedding Compression.☆62Updated 2 years ago
- Time-based Sequence Model for Personalization and Recommendation Systems☆49Updated 3 years ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 3 years ago
- Hyperparameter tuning via uncertainty modeling☆47Updated last year
- [ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)☆22Updated 2 years ago
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆62Updated 3 years ago
- AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks☆42Updated 7 years ago
- Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.☆12Updated 3 years ago
- A compressed alternative to matrix multiplication using state-of-the art compression ROBE-Z☆9Updated last year
- A supplementary code for Beyond Vector Spaces: Compact Data Representation as Differentiable Weighted Graphs.☆47Updated 5 years ago
- Successfully training approximations to full-rank matrices for efficiency in deep learning.☆17Updated 4 years ago
- AdamW optimizer for bfloat16 models in pytorch 🔥.☆33Updated last year
- Implementation of vector quantization algorithms, codes for Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner P…☆59Updated 4 years ago
- High performance pytorch modules☆18Updated 2 years ago
- Efficient LDA solution on GPUs.☆24Updated 6 years ago
- Distributed DataLoader For Pytorch Based On Ray☆24Updated 3 years ago
- Parallel SGD, done locally and remote☆14Updated 9 years ago
- ☆12Updated 4 years ago
- Triton kernels for Flux☆20Updated last week
- Code for paper 'Minimizing FLOPs to Learn Efficient Sparse Representations' published at ICLR 2020☆20Updated 5 years ago
- ☆18Updated last month
- Distributed ML Optimizer☆32Updated 3 years ago
- MLPruning, PyTorch, NLP, BERT, Structured Pruning☆20Updated 4 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆50Updated 3 years ago
- "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation☆29Updated 5 months ago