kyegomez / GPT3Links
An implementation of the base GPT-3 Model architecture from the paper by OPENAI "Language Models are Few-Shot Learners"
☆18Updated last year
Alternatives and similar repositories for GPT3
Users that are interested in GPT3 are comparing it to the libraries listed below
Sorting:
- The open source implementation of the base model behind GPT-4 from OPENAI [Language + Multi-Modal]☆10Updated last year
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆119Updated 9 months ago
- Collection of autoregressive model implementation☆85Updated 2 months ago
- Implementation of Infini-Transformer in Pytorch☆111Updated 6 months ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆17Updated last year
- PyTorch implementation of moe, which stands for mixture of experts☆45Updated 4 years ago
- Implementation of the Llama architecture with RLHF + Q-learning☆165Updated 5 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆55Updated last year
- Unofficial Implementation of Evolutionary Model Merging☆39Updated last year
- This is the code that went into our practical dive using mamba as information extraction☆53Updated last year
- A repository for research on medium sized language models.☆77Updated last year
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆28Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆98Updated 9 months ago
- ☆68Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆127Updated 10 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆145Updated 9 months ago
- We study toy models of skill learning.☆29Updated 5 months ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆177Updated 10 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆55Updated 4 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆25Updated 2 weeks ago
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆36Updated 8 months ago
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind☆55Updated last month
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆120Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆101Updated 6 months ago
- Code for NeurIPS LLM Efficiency Challenge☆59Updated last year
- ☆81Updated last year
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆102Updated 2 years ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆159Updated 3 months ago
- [NeurIPS 2024] Low rank memory efficient optimizer without SVD☆30Updated 2 weeks ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 5 months ago