IBM / unitxt
🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
☆176Updated this week
Alternatives and similar repositories for unitxt:
Users that are interested in unitxt are comparing it to the libraries listed below
- codebase release for EMNLP2023 paper publication☆19Updated 11 months ago
- The Granite Guardian models are designed to detect risks in prompts and responses.☆65Updated last month
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆34Updated this week
- A package dedicated for running benchmark agreement testing☆16Updated 2 months ago
- ☆251Updated 2 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆59Updated 2 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆106Updated last week
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆63Updated last year
- ☆32Updated 7 months ago
- Datasets collection and preprocessings framework for NLP extreme multitask learning☆175Updated last month
- ☆113Updated 4 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆125Updated 11 months ago
- Code for the paper "Fishing for Magikarp"☆142Updated last month
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆67Updated 4 months ago
- awesome synthetic (text) datasets☆261Updated 3 months ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆55Updated 6 months ago
- WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting.☆37Updated 6 months ago
- Dolomite Engine is a library for pretraining/finetuning LLMs☆36Updated this week
- A Lossless Compression Library for AI pipelines☆224Updated this week
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆203Updated 3 months ago
- Codebase accompanying the Summary of a Haystack paper.☆74Updated 5 months ago
- ☆117Updated 4 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆100Updated 5 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆57Updated 11 months ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆185Updated 4 months ago
- The repository contains generative AI analytics platform application code.☆23Updated 3 months ago
- Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya☆107Updated 2 weeks ago
- Let's build better datasets, together!☆252Updated 2 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆252Updated 7 months ago
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆56Updated 8 months ago