ckkissane / rlhf-shakespeare
Shakespeare transformer fine-tuned to generate positive sentiment samples using RLHF
β10Updated last year
Related projects: β
- Large scale 4D parallelism pre-training for π€ transformers in Mixture of Experts *(still work in progress)*β77Updated 9 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response formatβ23Updated last year
- β27Updated 5 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flβ¦β52Updated last month
- PyTorch implementation for MRLβ17Updated 6 months ago
- β29Updated 2 weeks ago
- β24Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Modelβ40Updated 8 months ago
- Small and Efficient Mathematical Reasoning LLMsβ69Updated 7 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Modelsβ37Updated 3 months ago
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" givenβ¦β14Updated 11 months ago
- Collection of autoregressive model implementationβ62Updated 2 weeks ago
- A repository for research on medium sized language models.β71Updated 3 months ago
- Repository for "I am a Strange Dataset: Metalinguistic Tests for Language Models"β34Updated 8 months ago
- Here we provide and collect many functions to generate math problem and step by step solutions for LLM trainingβ17Updated last year
- Repository for Skill Set Optimizationβ12Updated last month
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)β25Updated 6 months ago
- β45Updated 7 months ago
- Minimum Description Length probing for neural network representationsβ15Updated 11 months ago
- β22Updated last year
- Official implementation of Goldfish Loss: Mitigating Memorization in Generative LLMsβ68Updated 2 months ago
- Critique-out-Loud Reward Modelsβ17Updated 2 weeks ago
- β25Updated 9 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ39Updated 3 weeks ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pβ¦β33Updated last year
- β44Updated 2 months ago
- β18Updated this week
- β37Updated last year
- "Improving Mathematical Reasoning with Process Supervision" by OPENAIβ55Updated last week
- β50Updated last month