SmallDoges/small-datasets

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SmallDoges/small-datasets)

SmallDoges / small-datasets

Distill thinking dataset more compactly and accurately!

☆38

Alternatives and similar repositories for small-datasets

Users that are interested in small-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ctlllll / understanding_llm_benchmarks
View on GitHub
Understanding the correlation between different LLM benchmarks
☆30Jan 11, 2024Updated 2 years ago
Nicolas-Yax / PhyloLM
View on GitHub
Genetics for Language Models
☆18Jul 1, 2024Updated 2 years ago
betagouv / ComparIA
View on GitHub
Open source LLM arena created by the French Government
☆77Updated this week
jwjohns / LFM2Sloth
View on GitHub
Modular task agnostic training pipeline using LFM2 from Liquid AI with unsloth.
☆16Sep 13, 2025Updated 10 months ago
Nicolas-BZRD / llm-distillation
View on GitHub
☆11Feb 3, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
pradyGn / zoof
View on GitHub
Zoof is a high-efficiency Small Language Model (SLM) engineered from scratch. It demonstrates how modern architectural choices and high-q…
☆47Jan 13, 2026Updated 6 months ago
RUCAIBox / EASYEP
View on GitHub
☆28Apr 14, 2025Updated last year
QuixiAI / spectrum
View on GitHub
☆145Aug 20, 2025Updated 11 months ago
hesamsheikh / dataset_git_commands
View on GitHub
☆13Aug 5, 2024Updated last year
SebastianBodza / EnsembleForecasting
View on GitHub
Using multiple LLMs for ensemble Forecasting
☆16Jan 17, 2024Updated 2 years ago
dmahan93 / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of autoregressive language models.
☆16Aug 23, 2023Updated 2 years ago
mlfoundations / evalchemy
View on GitHub
Automatic evals for LLMs
☆600Feb 24, 2026Updated 4 months ago
darrow-labs / LegalLens
View on GitHub
☆10Jul 15, 2024Updated 2 years ago
thisisanshgupta / Senna
View on GitHub
Senna is an advanced AI-powered search engine designed to provide users with immediate answers to their queries by leveraging natural lan…
☆19Sep 5, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
allenai / gpv2-web10k
View on GitHub
Download Web-10K data by querying Bing Image Search
☆10Feb 1, 2022Updated 4 years ago
AnswerDotAI / fastdata
View on GitHub
☆160Dec 2, 2024Updated last year
qagentur / texttunnel
View on GitHub
Python package for extractive NLP using the OpenAI API
☆17Aug 28, 2024Updated last year
fblgit / model-similarity
View on GitHub
Simple Model Similarities Analysis
☆21Feb 3, 2024Updated 2 years ago
ZbigniewTomanek / my-mcp-server
View on GitHub
☆23Apr 16, 2025Updated last year
ernie-research / Tool-Augmented-Reward-Model
View on GitHub
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆54Jun 6, 2025Updated last year
antoinejeannot / jurisprudence
View on GitHub
French Jurisprudences at your fingertips @ every 72h
☆18Nov 18, 2025Updated 8 months ago
google / lmeval
View on GitHub
☆238Nov 27, 2025Updated 7 months ago
huggingface / cosmopedia
View on GitHub
☆572Nov 20, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Laz4rz / RL
View on GitHub
☆15Jan 26, 2025Updated last year
WailordHe / cv-arxiv-daily-wailord
View on GitHub
🎓Automatically Update CV Papers Daily using Github Actions (Update Every 12th hours)
☆12May 17, 2026Updated 2 months ago
harbor-framework / harbor-index
View on GitHub
A compact high-signal benchmark for evaluating frontier agents
☆19Updated this week
MikeWangWZHL / VDLM
View on GitHub
Repo for paper: https://arxiv.org/abs/2404.06479
☆30Oct 3, 2024Updated last year
cloneofsimo / fim-llama-deepspeed
View on GitHub
☆33Jan 1, 2024Updated 2 years ago
artefactory / deploy_stable_difusion
View on GitHub
☆13Mar 21, 2025Updated last year
StealthyPanda / quantumcomputingsim
View on GitHub
A library to simulate quantum computations
☆12Dec 30, 2023Updated 2 years ago
HamzaG737 / legal-code-rag
View on GitHub
Repo for advanced RAG evaluation on french legal Code data
☆26Apr 7, 2024Updated 2 years ago
memochou1993 / memochou1993.github.io
View on GitHub
Memo's Blog
☆27Jan 21, 2026Updated 6 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
RistovaIvona / Bank-Marketing
View on GitHub
A Data-Driven Approach to Predict the Success of Bank Telemarketing
☆10Apr 27, 2021Updated 5 years ago
strongdm / attractorbench
View on GitHub
NLSpec instruction following benchmark for https://factory.strongdm.ai/products/attractor
☆19Feb 26, 2026Updated 4 months ago
harbor-framework / terminal-bench-challenges
View on GitHub
☆18Jun 18, 2026Updated last month
Shark-NLP / self-adaptive-ICL
View on GitHub
self-adaptive in-context learning
☆45May 5, 2023Updated 3 years ago
cobalt-uoft / datasets
View on GitHub
Automatically generated and up-to-date datasets for Cobalt.
☆10May 16, 2020Updated 6 years ago
asmeurer / catimg
View on GitHub
Print an image of a cat to the iTerm2 terminal
☆14Feb 7, 2017Updated 9 years ago
agarwalishika / DELIFT
View on GitHub
☆16Feb 21, 2025Updated last year