rasbt / b3-basic-batchsize-benchmarkLinks
Experiments for the blog post "No, We Don't Have to Choose Batch Sizes As Powers Of 2"
☆20Updated 2 years ago
Alternatives and similar repositories for b3-basic-batchsize-benchmark
Users that are interested in b3-basic-batchsize-benchmark are comparing it to the libraries listed below
Sorting:
- ☆15Updated 3 years ago
- Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorch☆12Updated 3 years ago
- PyTorch implementation of GLOM☆22Updated 3 years ago
- bumble bee transformer☆14Updated 4 years ago
- ☆31Updated last month
- Implementation of "Analysing Mathematical Reasoning Abilities of Neural Models"☆29Updated 2 years ago
- Implementation of N-Grammer in Flax☆17Updated 2 years ago
- Describe the format of image/text datasets☆11Updated 3 years ago
- Local Attention - Flax module for Jax☆22Updated 4 years ago
- DiCE: The Infinitely Differentiable Monte-Carlo Estimator☆31Updated last year
- The collection of bulding blocks building fine-tunable metric learning models☆32Updated 2 months ago
- High performance pytorch modules☆18Updated 2 years ago
- Usable implementation of Mogrifier, a circuit for enhancing LSTMs and potentially other networks, from Deepmind☆19Updated 11 months ago
- Simplifying parsing of large jsonline files in NLP Workflows☆12Updated 3 years ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated last week
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- Load any clip model with a standardized interface☆21Updated last year
- A dashboard for exploring timm learning rate schedulers☆19Updated 6 months ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆50Updated 3 years ago
- Bi-Directional Attention Flow for Machine Comprehensions☆9Updated 7 years ago
- Implementation of Insertion-deletion Denoising Diffusion Probabilistic Models☆30Updated 3 years ago
- codes for TokenManipulationGAN☆7Updated 5 years ago
- AdaCat☆49Updated 2 years ago
- A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis.☆13Updated 3 years ago
- This repository hosts the code to port NumPy model weights of BiT-ResNets to TensorFlow SavedModel format.☆14Updated 3 years ago
- A sample pattern for running CI tests on Modal☆18Updated last month
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated last year
- reproduces experiments from "Grounding inductive biases in natural images: invariance stems from variations in data"☆17Updated 8 months ago
- Shows how to do parameter ensembling using differential evolution.☆10Updated 3 years ago
- ☆15Updated 2 years ago