huggingface / rlhf-interface
β32Updated last year
Related projects β
Alternatives and complementary repositories for rlhf-interface
- π€ Disaggregators: Curated data labelers for in-depth analysis.β65Updated last year
- A library for squeakily cleaning and filtering language datasets.β45Updated last year
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pβ¦β34Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.β92Updated last year
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizersβ58Updated 4 months ago
- Experiments with generating opensource language model assistantsβ97Updated last year
- β46Updated this week
- a pipeline for using api calls to agnostically convert unstructured data into structured training dataβ28Updated 2 months ago
- Training and Inference Notebooks for the RedPajama (OpenLlama) modelsβ18Updated last year
- Scripts to convert datasets from various sources to Hugging Face Datasets.β57Updated 2 years ago
- Exploring finetuning public checkpoints on filter 8K sequences on Pileβ115Updated last year
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.β25Updated last year
- β64Updated 2 years ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laβ¦β44Updated last year
- Tools to make language models a bit easier to useβ30Updated this week
- Repository containing the SPIN experiments on the DIBT 10k ranked promptsβ23Updated 8 months ago
- minimal pytorch implementation of bm25 (with sparse tensors)β90Updated 8 months ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrievalβ27Updated 2 years ago
- Scripts supporting the development and serving the Roots Search Tool - https://hf.co/spaces/bigscience-data/roots-searchβ10Updated last year
- Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and teβ¦β42Updated 10 months ago
- NLP with Rust for Python π¦πβ59Updated 5 months ago
- A library to create and manage configuration files, especially for machine learning projects.β77Updated 2 years ago
- β22Updated last year
- β24Updated last year
- PyTorch implementation for MRLβ18Updated 8 months ago
- Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)β60Updated last year
- Supervised instruction finetuning for LLM with HF trainer and Deepspeedβ34Updated last year
- Embedding Recycling for Language modelsβ38Updated last year