li-plus / nanoRLHFView external linksLinks
Train a tiny LLaMA model from scratch to repeat your words using Reinforcement Learning from Human Feedback (RLHF)
☆16May 23, 2024Updated last year
Alternatives and similar repositories for nanoRLHF
Users that are interested in nanoRLHF are comparing it to the libraries listed below
Sorting:
- SFT+RL boosts multimodal reasoning☆45Jun 27, 2025Updated 7 months ago
- ☆12Sep 22, 2015Updated 10 years ago
- ☆10Jun 4, 2020Updated 5 years ago
- This repo consist of some experimental results on bdd100k datasets using different object detection algorithms(Faster-RCNN, FCOS, ATSS)☆11Jun 27, 2020Updated 5 years ago
- ☆10Mar 30, 2023Updated 2 years ago
- Cluster paraphrases by word sense☆12Jan 3, 2019Updated 7 years ago
- lime-ner: extending LIME for Named Entity Recognition☆10Aug 15, 2018Updated 7 years ago
- gmsh meshing for pipes☆10Oct 21, 2021Updated 4 years ago
- Repo collects Homework code for DSCI552/INF552 @USC 20Fall Semester.☆14Nov 27, 2020Updated 5 years ago
- Implementation of Adaptive Noise Reduction and Background Noise Classification using External Microphones on iOS☆16Apr 30, 2019Updated 6 years ago
- 自己实现的一些美颜美妆算法☆11Oct 11, 2020Updated 5 years ago
- content.rdf.u8.gz☆10Dec 15, 2020Updated 5 years ago
- Enhancing Sentence Embedding with Generalized Pooling☆11Jul 26, 2018Updated 7 years ago
- Compiler for the Tiger programming language☆12Oct 27, 2018Updated 7 years ago
- A Haskell implementation of the tiger compiler☆10May 2, 2020Updated 5 years ago
- ☆10Apr 28, 2021Updated 4 years ago
- Official Github Repository for "Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees". (NeurIPS 2024)☆11Nov 30, 2025Updated 2 months ago
- ☆11Mar 5, 2024Updated last year
- Solving Logic Grid Puzzles with Part-of-Speech Tagging and First-Order Logic☆11Dec 18, 2016Updated 9 years ago
- Natural Perturbation for Robust Question Answering☆12Apr 7, 2020Updated 5 years ago
- generation of training-optimised weather datasets from declarative syntax☆12Updated this week
- A python library for easily querying morphological inflection models trained on Unimorph☆13Oct 23, 2022Updated 3 years ago
- Algorithm training.☆10Jul 21, 2020Updated 5 years ago
- ☆15Nov 8, 2021Updated 4 years ago
- Play with various big data technologies☆10Jul 12, 2017Updated 8 years ago
- ☆12May 20, 2025Updated 8 months ago
- Just a helper script for invoking kohya converter (and maybe a cheeky inferencer to check it worked okay)☆11Aug 26, 2023Updated 2 years ago
- Social Distancing Analyzer using OpenCV and YOLO☆10Aug 30, 2024Updated last year
- Ocaml code from Writing an Interpreter in Go☆11Aug 16, 2019Updated 6 years ago
- The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions (EMNLP 2023))☆13Dec 21, 2023Updated 2 years ago
- 使用numpy从零开始实现llama3的推理流程,并对其进行封装,对比GPU,CPU上的表现以及Lora微调。llama3 implemented from scratch using numpy and lora fine-tune.。☆12Jul 16, 2024Updated last year
- Code space for L4DC paper "State-wise Safe Reinforcement Learning With Pixel Observations"☆12Apr 5, 2024Updated last year
- BEGGER DATA☆11Sep 30, 2017Updated 8 years ago
- Run FeatureTools to automate Feature Engineering distributionally on Spark.☆11Oct 11, 2018Updated 7 years ago
- A small bottle chat application.☆22Sep 16, 2011Updated 14 years ago
- PyTorch Implementation of MobileDet (https://arxiv.org/abs/2004.14525v3) backbones.☆11Feb 12, 2024Updated 2 years ago
- Data and Code for Paper "Reflect Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality" (EMNLP 2022)☆11Nov 28, 2022Updated 3 years ago
- A concise Hindley-Milner type inferencer (algorithm W) implemented with Scala☆17May 13, 2013Updated 12 years ago
- ABtest实操☆12Sep 16, 2020Updated 5 years ago