a set of scripts to easily convert all training data from huggingface into alpaca instruct or sharegpt format, which should allow for ease of use with any trainer
☆18Mar 14, 2025Updated 11 months ago
Alternatives and similar repositories for Dataset-Conversion-Toolkit
Users that are interested in Dataset-Conversion-Toolkit are comparing it to the libraries listed below
Sorting:
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- Project code for training LLMs to write better unit tests + code☆21May 19, 2025Updated 9 months ago
- Semantic grep powered by Jina embeddings v5 (MLX on Apple Silicon)☆123Updated this week
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Feb 29, 2024Updated 2 years ago
- Java wrapper for the Fortran L-BFGS-B algorithm☆33Dec 23, 2016Updated 9 years ago
- ☆27Aug 30, 2023Updated 2 years ago
- Understanding the correlation between different LLM benchmarks☆29Jan 11, 2024Updated 2 years ago
- YAML Parsing GitHub Action☆12Oct 3, 2024Updated last year
- This is the repo for CROssBARv2 Knowledge Graph data. CROssBARv2 is a heterogeneous general-purpose biomedical KG-based system.☆11Feb 4, 2026Updated 3 weeks ago
- Open-source Human Feedback Library☆11Oct 25, 2023Updated 2 years ago
- vault plugin for artifactory☆12Aug 30, 2024Updated last year
- Transfer Family FTP/SFTP server with password authentication sample with TypeScript CDK☆11Nov 17, 2025Updated 3 months ago
- Very minimal (and stateless) agent framework☆44Jan 12, 2025Updated last year
- Repository with which to explore k-diffusion and diffusers, and within which changes to said packages may be tested.☆55Jan 28, 2024Updated 2 years ago
- Puppet module for managing md raid arrays☆14Oct 19, 2022Updated 3 years ago
- python越南语分词器☆10Nov 14, 2019Updated 6 years ago
- A Github action to convert CSV to markdown☆13Jul 7, 2025Updated 7 months ago
- Layers, datasets and utilities for PyTorch☆10Nov 22, 2023Updated 2 years ago
- Computes and displays the visual differences between two URLs☆12Aug 17, 2022Updated 3 years ago
- Sample project to build and run Turso's SQLite fork on iOS and use vector search functionality on device☆15Jul 26, 2024Updated last year
- 练习题,python 协同过滤ALS模型实现:商品推荐 + 用户人群放大☆10Jun 4, 2020Updated 5 years ago
- ☆14Nov 11, 2024Updated last year
- Web based remote for LG TVs with WebOS running on Node.js, Websockets and React☆10Mar 30, 2016Updated 9 years ago
- ☆11Mar 23, 2025Updated 11 months ago
- Tiny evaluation of leading LLMs on competitive programming problems☆14Nov 28, 2024Updated last year
- ☆12Apr 30, 2019Updated 6 years ago
- JFC! What a hot mess. *Scream into void*☆13Sep 20, 2021Updated 4 years ago
- personal entropy reduction system☆38Updated this week
- ☆14Jan 10, 2025Updated last year
- AI Agents Workshop with Red Hat AI☆13Feb 26, 2025Updated last year
- Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚☆22Jul 14, 2025Updated 7 months ago
- SIGIR 2021: Proactive Retrieval-based Chatbots based on Relevant Knowledge and Goals☆11Jul 30, 2021Updated 4 years ago
- Tissue-specific variant annotation☆10Nov 19, 2018Updated 7 years ago
- Open-source repository for the OOPSLA'24 paper "CYCLE: Learning to Self-Refine Code Generation"☆10Mar 8, 2024Updated last year
- ☆15Apr 26, 2025Updated 10 months ago
- A daily benchmark to regression-test cloud LLMs☆17Aug 7, 2025Updated 6 months ago
- This is a fork of optimization part of RISO project (http://riso.sourceforge.net/)☆13Aug 30, 2015Updated 10 years ago
- ☆11Feb 21, 2019Updated 7 years ago
- 🚀 Sliding Window Attention Training for Efficient Large Language Models☆16Dec 8, 2025Updated 2 months ago