Codes for "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"
☆147Jun 10, 2019Updated 6 years ago
Alternatives and similar repositories for macaron-net
Users that are interested in macaron-net are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Source code for "Efficient Training of BERT by Progressively Stacking"☆112Jul 3, 2019Updated 6 years ago
- Source code for "A Lightweight Recurrent Network for Sequence Modeling"☆26Dec 7, 2022Updated 3 years ago
- PyTorch implementation of Transformer-based Neural Machine Translation☆77Dec 14, 2022Updated 3 years ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.☆22Jan 25, 2023Updated 3 years ago
- [ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845☆120Jun 20, 2021Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆20Feb 26, 2021Updated 5 years ago
- Codes for "Towards Binary-Valued Gates for Robust LSTM Training".☆75Jul 22, 2018Updated 7 years ago
- Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"☆580Aug 28, 2019Updated 6 years ago
- A tensorflow implementation of the NIPS 2018 paper "Variational Inference with Tail-adaptive f-Divergence"☆20Jan 11, 2019Updated 7 years ago
- code for Explicit Sparse Transformer☆60Jul 21, 2023Updated 2 years ago
- Code for the paper Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems (ACL19)☆100Oct 17, 2022Updated 3 years ago
- Repository for ACL 2019 paper☆75Jun 30, 2019Updated 6 years ago
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆15Jun 28, 2025Updated 10 months ago
- The implementation of "Learning Deep Transformer Models for Machine Translation"☆116Jul 25, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Some good(maybe) papers about NMT (Neural Machine Translation).☆85Jan 15, 2020Updated 6 years ago
- Latent Alignment and Variational Attention☆329Nov 5, 2018Updated 7 years ago
- Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"☆127Apr 5, 2021Updated 5 years ago
- ☆14May 7, 2019Updated 7 years ago
- Tensorflow Source code for "Recurrently Controlled Recurrent Networks" (NIPS 2018)☆23Oct 25, 2018Updated 7 years ago
- Transformer training code for sequential tasks☆609Sep 14, 2021Updated 4 years ago
- ☆10Feb 12, 2020Updated 6 years ago
- Code for the article "Automatic Temperature Control for Neural Machine Translation" (EMNLP 2018)☆14Apr 16, 2019Updated 7 years ago
- ☆10Jun 14, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code for our nips19 paper: You Only Propagate Once: Accelerating Adversarial Training Via Maximal Principle☆180Jul 25, 2024Updated last year
- Code for ACL2020 "Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation"☆39Jun 24, 2020Updated 5 years ago
- Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)☆16Jan 7, 2025Updated last year
- ☆14Nov 16, 2022Updated 3 years ago
- Understanding the Difficulty of Training Transformers☆332May 31, 2022Updated 3 years ago
- ICLR2020 Downloader & Search Tool☆18Oct 8, 2019Updated 6 years ago
- Extend bert-nmt to context-aware translation.☆11May 24, 2021Updated 5 years ago
- Experiments with Neural ODEs and Adversarial Attacks☆45Jan 13, 2019Updated 7 years ago
- Implementation of the Optimal Completion Distillation for Sequence Labeling☆17Jul 25, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Code for NIPS 2018 paper 'Frequency-Agnostic Word Representation'☆115May 2, 2019Updated 7 years ago
- Pytorch library for fast transformer implementations☆1,771Mar 23, 2023Updated 3 years ago
- code for paper "Improving Sequence-to-Sequence Learning via Optimal Transport"☆68Jun 24, 2019Updated 6 years ago
- Codes for <Kernelized Bayesian Softmax for Text Generation> in NeurIPS 2019☆16Nov 20, 2019Updated 6 years ago
- Experiments from the paper "On Second Order Behaviour in Augmented Neural ODEs"☆61Sep 30, 2024Updated last year
- STCN: Stochastic Temporal Convolutional Networks☆69Jul 15, 2020Updated 5 years ago
- (Batched) advanced indexing for PyTorch.☆54Dec 26, 2024Updated last year