Code repo for "Transformer on a Diet" paper
☆31Jun 22, 2020Updated 5 years ago
Alternatives and similar repositories for transformer-on-diet
Users that are interested in transformer-on-diet are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is the source code of our paper PALT in EMNLP2022.☆13Nov 19, 2022Updated 3 years ago
- Running massive simulations using RNNs on CPUs for building bots and all kinds of things.☆13Jun 13, 2021Updated 4 years ago
- [NAACL 2018] Robust Sequence Labeling with Adversarial Training☆10Sep 30, 2019Updated 6 years ago
- ☆220Jun 8, 2020Updated 5 years ago
- ☆10May 16, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Supporting example for "A Rust SentencePiece implementation"☆20Jun 7, 2020Updated 5 years ago
- "What is Learned in Visually Grounded Neural Syntax Acquisition", Noriyuki Kojima, Hadar Averbuch-Elor, Alexander Rush and Yoav Artzi (AC…☆12Dec 30, 2021Updated 4 years ago
- explores Chinese language models with sub-character level visual information☆16Oct 5, 2018Updated 7 years ago
- An easy way to start a python programming environment using GitHub Codespaces.☆15Sep 9, 2020Updated 5 years ago
- A few models converted from caffe to CoreMLs format.☆15Jun 6, 2017Updated 8 years ago
- ☆14Jan 6, 2025Updated last year
- ☆46Apr 13, 2022Updated 4 years ago
- Transformer language model (GPT-2) with sentencepiece tokenizer☆10Oct 15, 2019Updated 6 years ago
- General information about DEEP BERLIN's AI for Good Hackathon 2020☆11Apr 14, 2020Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆15Nov 5, 2020Updated 5 years ago
- ☆15Apr 8, 2022Updated 4 years ago
- Weakly Supervised Learning: Introduction and Best Practices☆12Jul 3, 2019Updated 6 years ago
- A case study of efficient training of large language models using commodity hardware.☆67Aug 4, 2022Updated 3 years ago
- Specialization of BERT architecture both for the Spanish language and the Twitter domain☆13Nov 6, 2020Updated 5 years ago
- Easy access to administrative boundary data with python☆17Oct 4, 2022Updated 3 years ago
- vIPer: a new tool for IPython notebooks.☆60Jan 7, 2015Updated 11 years ago
- ☆14Jun 2, 2021Updated 4 years ago
- 2nd place solution of ECCV 2020 workshop VIPriors Image Classification Challenge, https://arxiv.org/abs/2008.00261☆13Aug 22, 2021Updated 4 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Domain Adaptation of Neural Machine Translation by Lexicon Induction☆20Jan 3, 2020Updated 6 years ago
- BERT models pretrained on the CORD-19 Kaggle dataset☆15Jun 8, 2020Updated 5 years ago
- Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Mo…☆15Nov 11, 2024Updated last year
- ☆10Mar 18, 2021Updated 5 years ago
- This repository contains all code examples for my TensorFlow World talk about "Advanced model deployments with TensorFlow Serving"☆17Dec 8, 2022Updated 3 years ago
- A text classification model with pretrained GloVe embeddings☆14Dec 3, 2019Updated 6 years ago
- LGEB: Benchmark of Language Generation Evaluation☆16Oct 21, 2022Updated 3 years ago
- Studying GPU Multi-tenancy☆11Jan 11, 2019Updated 7 years ago
- The implementation of "Learning Deep Transformer Models for Machine Translation"☆116Jul 25, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Top-1 Acc=61.0% on ImageNet, without any sacrificing compared with SqueezeNet v1.1.☆22Jun 30, 2017Updated 8 years ago
- Based on Thompson sampling with the online bootstrap (Dean Eckles, Maurits Kaptein). http://arxiv.org/abs/1410.4009☆11Dec 30, 2014Updated 11 years ago
- Convenient DL serving☆72Sep 13, 2021Updated 4 years ago
- Solution for the 2nd place in Telegram Data Clustering Contest (https://contest.com/docs/data_clustering2).☆12Nov 19, 2020Updated 5 years ago
- ☆10Apr 5, 2017Updated 9 years ago
- Code to accompany my blog post "Dissecting the Hype With Cheminformatics"☆12Sep 22, 2019Updated 6 years ago
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters☆21Mar 2, 2016Updated 10 years ago