argilla-io / awesome-llm-datasetsLinks

👩🤝🤖 A curated list of datasets for large language models (LLMs), RLHF and related resources (continually updated)

☆23

Alternatives and similar repositories for awesome-llm-datasets

Users that are interested in awesome-llm-datasets are comparing it to the libraries listed below

Sorting:

jina-ai / jerboa
LLM finetuning
☆42Updated last year
QuixiAI / kraken
☆66Updated last year
catena-labs / moa-llm
A Python library to orchestrate LLMs in a neural network-inspired structure
☆49Updated 9 months ago
iulia-b10 / multilingual-embedding-models
☆20Updated last year
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 6 months ago
matthewrenze / jhu-concise-cot
The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models
☆22Updated 8 months ago
Alignment-Lab-AI / KnowledgeBase
never forget anything again! combine AI and intelligent tooling for a local knowledge base to track catalogue, annotate, and plan for you…
☆37Updated last year
Alignment-Lab-AI / Our-Projects
A repository of projects and datasets under active development by Alignment Lab AI
☆22Updated last year
louisbrulenaudet / ragoon
High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡
☆66Updated 9 months ago
kyegomez / Finetuning-Suite
Finetune any model on HF in less than 30 seconds
☆57Updated last week
TuanaCelik / unstructuredio-haystack
💙 Unstructured Data Connectors for Haystack 2.0
☆17Updated last year
kookaburracodes / investor-education-chatchain
Not financial advice.
☆28Updated 2 years ago
automix-llm / automix
Mixing Language Models with Self-Verification and Meta-Verification
☆105Updated 7 months ago
argilla-io / notus
Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…
☆168Updated last year
langchain-ai / prompt-eval-recommendation
Streamlit app for recommending eval functions using prompt diffs
☆29Updated last year
kyegomez / Andromeda
An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast
☆151Updated 11 months ago
deployradiant / pychatml
Chat Markup Language conversation library
☆55Updated last year
dair-ai / llm-evaluator
Example for Logging LLM Evaluator Prompt Responses
☆18Updated last year
kyegomez / forest-of-thoughts
A forest of autonomous agents.
☆19Updated 6 months ago
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆53Updated 8 months ago
andrewgcodes / FalconStreaming
Falcon40B and 7B (Instruct) with streaming, top-k, and beam search
☆40Updated 2 years ago
discus-labs / discus
A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ
☆63Updated last year
mzbac / mlx-moe
Scripts to create your own moe models using mlx
☆90Updated last year
LLM360 / Analysis360
Open Implementations of LLM Analyses
☆105Updated 9 months ago
mithril-security / blind_llama_client
Zero-trust AI APIs for easy and private consumption of open-source LLMs
☆40Updated last year
The-Swarm-Corporation / swarms-cloud
Deploy your autonomous agents to production grade environments with 99% Uptime Guarantee, Infinite Scalability, and self-healing.
☆42Updated last week
tanyuqian / cappy
NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
☆43Updated last year
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆49Updated last year
weaviate-tutorials / Hurricane
Writing Blog Posts with Generative Feedback Loops!
☆50Updated last year
Doriandarko / OraclesGPT
☆11Updated last year