kyegomez / GPT3Links
An implementation of the base GPT-3 Model architecture from the paper by OPENAI "Language Models are Few-Shot Learners"
☆19Updated 11 months ago
Alternatives and similar repositories for GPT3
Users that are interested in GPT3 are comparing it to the libraries listed below
Sorting:
- The open source implementation of the base model behind GPT-4 from OPENAI [Language + Multi-Modal]☆10Updated last year
- We study toy models of skill learning.☆28Updated 4 months ago
- Train a production grade GPT in less than 400 lines of code. Better than Karpathy's verison and GIGAGPT☆15Updated last week
- Collection of autoregressive model implementation☆85Updated last month
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆41Updated last year
- Implementation of Infini-Transformer in Pytorch☆111Updated 5 months ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Updated last year
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆52Updated 2 months ago
- Implementation of the Mamba SSM with hf_integration.☆56Updated 9 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last year
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆119Updated 7 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆26Updated 7 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- Unofficial Implementation of Evolutionary Model Merging☆38Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated last year
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 7 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆37Updated last year
- ☆80Updated last year
- The open source community's implementation of the all-new Multi-Modal Causal Attention from "DeepSpeed-VisualChat: Multi-Round Multi-Imag…☆11Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆143Updated 8 months ago
- This is the code that went into our practical dive using mamba as information extraction☆54Updated last year
- A repository for research on medium sized language models.☆76Updated last year
- This repo is based on https://github.com/jiaweizzhao/GaLore☆28Updated 8 months ago
- PyTorch implementation of Hinton's FF Algorithm with hard negatives sampling☆14Updated 2 years ago
- Set of scripts to finetune LLMs☆37Updated last year
- Implementation of the Llama architecture with RLHF + Q-learning☆165Updated 4 months ago
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆37Updated last year
- ☆68Updated 10 months ago
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆35Updated 6 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 8 months ago