Norod / TrainGPT2-127M-FromScratch
A trio of Google-Colab notebooks (ipynb) for training a GPT-2 (127M) model from scratch (useful for other / non-English languages) using gpt-2-simple
☆15Updated 4 years ago
Alternatives and similar repositories for TrainGPT2-127M-FromScratch:
Users that are interested in TrainGPT2-127M-FromScratch are comparing it to the libraries listed below
- Experimental sampler to make LLMs more creative☆30Updated last year
- ☆16Updated 2 weeks ago
- A repository re-creating the PromptBreeder Evolutionary Algorithm from the DeepMind Paper in Python using LMQL as the backend.☆27Updated last year
- ☆14Updated last year
- ☆42Updated 2 years ago
- One stop shop for all things carp☆59Updated 2 years ago
- ☆27Updated last year
- Using short models to classify long texts☆21Updated last year
- ☆48Updated last year
- ☆24Updated last year
- ☆27Updated last year
- ☆14Updated 4 months ago
- The Next Generation Multi-Modality Superintelligence☆70Updated 5 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 7 months ago
- Finetune any model on HF in less than 30 seconds☆58Updated 3 weeks ago
- Doohickey is a stable diffusion tool for technical artists who want to stay up-to-date with the latest developments in the field.☆39Updated 2 years ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆14Updated 11 months ago
- Create soft prompts for fairseq 13B dense, GPT-J-6B and GPT-Neo-2.7B for free in a Google Colab TPU instance☆27Updated last year
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 11 months ago
- 🚀 Automatically convert unstructured data into a high-quality 'textbook' format, optimized for fine-tuning Large Language Models (LLMs)☆26Updated last year
- Load any clip model with a standardized interface☆21Updated 9 months ago
- Text-writing denoising diffusion (and much more)☆30Updated last year
- ☆32Updated last year
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- ☆20Updated 2 months ago
- Image Diffusion block merging technique applied to transformers based Language Models.☆54Updated last year
- ☆12Updated last year
- ☆34Updated last year
- Hidden Engrams: Long Term Memory for Transformer Model Inference☆35Updated 3 years ago
- GPT-jax based on the official huggingface library☆13Updated 3 years ago