rasbt / b3-basic-batchsize-benchmark
Experiments for the blog post "No, We Don't Have to Choose Batch Sizes As Powers Of 2"
☆19Updated 2 years ago
Alternatives and similar repositories for b3-basic-batchsize-benchmark:
Users that are interested in b3-basic-batchsize-benchmark are comparing it to the libraries listed below
- PyTorch implementation of GLOM☆21Updated 2 years ago
- Official Code for MIMETIC^2☆12Updated 2 months ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated last year
- My explorations into editing the knowledge and memories of an attention network☆34Updated 2 years ago
- ☆8Updated 7 months ago
- bumble bee transformer☆14Updated 3 years ago
- Simplifying parsing of large jsonline files in NLP Workflows☆12Updated 3 years ago
- ☆15Updated 3 years ago
- A dashboard for exploring timm learning rate schedulers☆19Updated 2 months ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated last week
- Describe the format of image/text datasets☆11Updated 2 years ago
- Bi-Directional Attention Flow for Machine Comprehensions☆9Updated 7 years ago
- The collection of bulding blocks building fine-tunable metric learning models☆32Updated last month
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆25Updated 10 months ago
- Ranking of fine-tuned HF models as base models.☆35Updated last year
- Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorch☆12Updated 3 years ago
- ☆31Updated 2 weeks ago
- High performance pytorch modules☆18Updated 2 years ago
- A python library for highly configurable transformers - easing model architecture search and experimentation.☆49Updated 3 years ago
- Load any clip model with a standardized interface☆21Updated 9 months ago
- ☆28Updated last year
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- Directed masked autoencoders☆14Updated 2 years ago
- Implements EvoNorms B0 and S0 as proposed in Evolving Normalization-Activation Layers.☆11Updated 4 years ago
- Code for running the experiments in Deep Subjecthood: Higher Order Grammatical Features in Multilingual BERT☆16Updated last year
- Implementation of N-Grammer in Flax☆16Updated 2 years ago
- ☆24Updated 3 years ago
- machine learning model performance metrics & charts with confidence intervals, optimized with numba to be fast☆16Updated 3 years ago
- codes for TokenManipulationGAN☆7Updated 4 years ago
- Adversarial examples to the new ConvNeXt architecture☆20Updated 3 years ago