kyegomez / GPT3
An implementation of the base GPT-3 Model architecture from the paper by OPENAI "Language Models are Few-Shot Learners"
☆15Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for GPT3
- The open source implementation of the base model behind GPT-4 from OPENAI [Language + Multi-Modal]☆11Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆50Updated 7 months ago
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆38Updated 6 months ago
- A repository for research on medium sized language models.☆74Updated 6 months ago
- Collection of autoregressive model implementation☆67Updated this week
- Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch☆19Updated last week
- Implementation of the Llama architecture with RLHF + Q-learning☆157Updated 11 months ago
- Implementation of the Mamba SSM with hf_integration.☆55Updated 2 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 6 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆93Updated last month
- Set of scripts to finetune LLMs☆36Updated 7 months ago
- Train a production grade GPT in less than 400 lines of code. Better than Karpathy's verison and GIGAGPT☆15Updated 2 weeks ago
- ☆28Updated 5 months ago
- Unofficial Implementation of Evolutionary Model Merging☆33Updated 7 months ago
- My fork os allen AI's OLMo for educational purposes.☆28Updated last week
- From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging☆58Updated last month
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆23Updated this week
- ☆41Updated 2 weeks ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆109Updated last month
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆108Updated 5 months ago
- Implementation of Adepts Fuyu all-new Multi-Modality model in pytorch☆24Updated 2 weeks ago
- The Next Generation Multi-Modality Superintelligence☆70Updated 2 months ago
- Using multiple LLMs for ensemble Forecasting☆16Updated 10 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated 9 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆130Updated 2 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆23Updated this week
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆33Updated 2 weeks ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆135Updated last month
- ☆49Updated last month