JingzhaoZhang / why-clipping-acceleratesView external linksLinks
A pytorch implementation for the LSTM experiments in the paper: Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity
☆47Feb 7, 2020Updated 6 years ago
Alternatives and similar repositories for why-clipping-accelerates
Users that are interested in why-clipping-accelerates are comparing it to the libraries listed below
Sorting:
- Official codebase for our paper "Joslim: Joint Widths and Weights Optimization for Slimmable Neural Networks"☆12Jun 30, 2021Updated 4 years ago
- ☆69Mar 4, 2020Updated 5 years ago
- ☆12Jul 7, 2021Updated 4 years ago
- [ECCV18] Constraint-Aware Deep Neural Network Compression☆12Sep 11, 2018Updated 7 years ago
- [JMLR] TRADES + random smoothing for certifiable robustness☆14Sep 13, 2020Updated 5 years ago
- Source code accompanying the ICLR2020 publication 'Massively Multilingual Sparse Word Representations' https://openreview.net/forum?id=Hy…☆12Aug 15, 2023Updated 2 years ago
- This is a Pytorch implementation of contrastive Learning(CL) baselines.☆14Aug 29, 2022Updated 3 years ago
- Implementation of the paper "Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory", Ron Amit and Ron Meir, ICML 2018☆18Apr 13, 2021Updated 4 years ago
- Experiments from "The Generalization-Stability Tradeoff in Neural Network Pruning": https://arxiv.org/abs/1906.03728.☆14Oct 23, 2020Updated 5 years ago
- batchboost is a variation on MixUp that instead of mixing just two images, mixes many images together.☆44Jan 26, 2020Updated 6 years ago
- Source codes for our AAAI'20 paper: Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions☆38Sep 22, 2020Updated 5 years ago
- Code for paper "Adversarial Support Alignment"☆23Apr 22, 2022Updated 3 years ago
- 基于BERT的预训练语言模型实现,分为两步:预训练和微调。目前已包括BERT、Roberta、ALbert三个模型,且皆可支持Whole Word Mask模式。☆18Feb 1, 2020Updated 6 years ago
- ☆16Sep 4, 2018Updated 7 years ago
- Deep Learning using Rectified Linear Units (ReLU)☆22Aug 2, 2024Updated last year
- Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."☆25Oct 7, 2020Updated 5 years ago
- Active Learning for Improved Semi Supervised Semantic Segmentation in Satellite Images☆23Mar 4, 2022Updated 3 years ago
- Official PyTorch implementation of "Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets" (ICLR 2023 notable top 25%)☆26Mar 18, 2024Updated last year
- Code for Active Mixup in 2020 CVPR☆23Jan 11, 2022Updated 4 years ago
- Cheap distillation for convolutional neural networks.☆35Oct 22, 2018Updated 7 years ago
- Related material on Federated Learning☆26Apr 9, 2020Updated 5 years ago
- Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation☆27Nov 7, 2019Updated 6 years ago
- Codebase for CVPR 2020 paper "Spatio-Temporal Graph for Video Captioning with Knowledge Distillation"☆23Mar 4, 2020Updated 5 years ago
- SparseMax activation function implementation (ICML 2016) (PyTorch)☆28Nov 30, 2017Updated 8 years ago
- A PyTorch implementation of various Online & Stochastic optimization algorithms for deep learning☆27Feb 4, 2022Updated 4 years ago
- This is the official repo for the experiments in the paper "Bilevel Programming for Hyperparameter Optimization and Meta-Learning"☆29Jun 7, 2018Updated 7 years ago
- ☆27Feb 10, 2022Updated 4 years ago
- ☆28Sep 13, 2021Updated 4 years ago
- Pytorch implementation of our paper accepted by IEEE TNNLS, 2022 -- Distilling a Powerful Student Model via Online Knowledge Distillation☆31Nov 11, 2021Updated 4 years ago
- Welcome to NeuCo-Bench, a benchmarking framework for evaluating compressed embeddings on downstream tasks.☆25Updated this week
- ☆74Dec 8, 2022Updated 3 years ago
- HackerRank, LeetCode, Cracking the Coding Interview Solutions in Python/C++☆11Jan 20, 2024Updated 2 years ago
- Implementation "Adapting Auxiliary Losses Using Gradient Similarity" article☆33Mar 1, 2019Updated 6 years ago
- Style Transfer by Rigid Alignment in Neural Net Feature Space☆11Jan 23, 2021Updated 5 years ago
- Code for reproducing work of ICML 2019 paper: Memory-Optimal Direct Convolutions for Maximizing Classification Accuracy in Embedded Appli…☆12Jun 8, 2019Updated 6 years ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- 新词发现/新词挖掘/自由度/凝固度/python3☆10May 28, 2019Updated 6 years ago
- A distributed data flow and computation system that runs on transactional messaging infrastructure☆11Oct 22, 2022Updated 3 years ago
- openapi of all third-party☆10Updated this week