aws-neuron / aws-neuron-reference-for-megatron-lmLinks
☆14Updated last year
Alternatives and similar repositories for aws-neuron-reference-for-megatron-lm
Users that are interested in aws-neuron-reference-for-megatron-lm are comparing it to the libraries listed below
Sorting:
- ☆17Updated 6 years ago
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Updated last year
- Source-to-Source Debuggable Derivatives in Pure Python☆15Updated last year
- Helper scripts and notes that were used while porting various nlp models☆45Updated 3 years ago
- A diff tool for language models☆43Updated last year
- Minimum Description Length probing for neural network representations☆18Updated 6 months ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆24Updated 2 weeks ago
- Official code release for the paper Coder Reviewer Reranking for Code Generation.☆45Updated 2 years ago
- lanmt ebm☆12Updated 5 years ago
- See https://github.com/cuda-mode/triton-index/ instead!☆11Updated last year
- ☆12Updated 3 years ago
- ☆13Updated 2 months ago
- Ranking of fine-tuned HF models as base models.☆35Updated 3 months ago
- ☆16Updated 2 years ago
- A Learnable LSH Framework for Efficient NN Training☆32Updated 4 years ago
- An attempt to merge ESBN with Transformers, to endow Transformers with the ability to emergently bind symbols☆16Updated 4 years ago
- Google Research☆46Updated 2 years ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆59Updated last week
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆58Updated 2 years ago
- This repository contains some of the code used in the paper "Training Language Models with Langauge Feedback at Scale"☆27Updated 2 years ago
- Transformers at any scale☆41Updated last year
- ☆13Updated 6 years ago
- Embedding Recycling for Language models☆39Updated 2 years ago
- ☆79Updated last year
- Learning to Model Editing Processes☆26Updated this week
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated last year
- Evaluation suite for large-scale language models.☆127Updated 3 years ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆37Updated last year
- AdamW optimizer for bfloat16 models in pytorch 🔥.☆35Updated last year