pldlgb/nuggets

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pldlgb/nuggets)

pldlgb / nuggets

☆89

Alternatives and similar repositories for nuggets

Users that are interested in nuggets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CASIA-LM / MoDS
View on GitHub
☆153Apr 16, 2024Updated 2 years ago
tianyi-lab / Cherry_LLM
View on GitHub
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…
☆417Jun 25, 2025Updated last year
hkust-nlp / deita
View on GitHub
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
☆600Dec 9, 2024Updated last year
IronBeliever / CaR
View on GitHub
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
☆91Nov 13, 2024Updated last year
tianyi-lab / Superfiltering
View on GitHub
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆189Jun 25, 2025Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
Cohere-Labs-Community / iterative-data-selection
View on GitHub
☆30Nov 5, 2024Updated last year
Blue-Raincoat / SelectIT
View on GitHub
☆24Oct 14, 2024Updated last year
princeton-nlp / LESS
View on GitHub
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
☆532Oct 20, 2024Updated last year
ZifanL / TSDS
View on GitHub
Implementation of TSDS: Data Selection for Task-Specific Model Finetuning. An optimal-transport framework for selecting domain-specific a…
☆19Dec 25, 2024Updated last year
tianyi-lab / Reflection_Tuning
View on GitHub
[ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
☆367Sep 6, 2024Updated last year
Cardinalere / Batch-ICL
View on GitHub
Code for paper 'Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning'
☆18Apr 19, 2024Updated 2 years ago
2003pro / TAGCOS
View on GitHub
This is the official implementation of TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data
☆13Jul 21, 2024Updated 2 years ago
xiaoboxia / HLC
View on GitHub
ICCV'2023: Holistic Label Correction for Noisy Multi-Label Classification
☆13Oct 29, 2023Updated 2 years ago
xiaoboxia / CoDis
View on GitHub
ICCV'2023: Combating Noisy Labels with Sample Selection by Mining High-Discrepancy Examples
☆12Oct 16, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Lyun0912-wu / LongAttn
View on GitHub
LongAttn ：Selecting Long-context Training Data via Token-level Attention
☆15Jul 16, 2025Updated last year
rookie-joe / automatic-lean4-compilation
View on GitHub
☆15Jul 29, 2024Updated 2 years ago
BatsResearch / nayak-aclfindings24-code
View on GitHub
☆22Jul 16, 2024Updated 2 years ago
Bolin97 / awesome-instruction-selector
View on GitHub
Paper list and datasets for the paper: A Survey on Data Selection for LLM Instruction Tuning
☆48Jan 22, 2026Updated 6 months ago
gl-ybnbxb / BoNBoN
View on GitHub
☆19Jun 3, 2024Updated 2 years ago
OFA-Sys / InsTag
View on GitHub
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
☆288Aug 20, 2023Updated 2 years ago
zhenyuhe00 / BiPE
View on GitHub
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024
☆24Jun 26, 2024Updated 2 years ago
eraseai / erase
View on GitHub
[CIKM-2024] Official code for work "ERASE: Error-Resilient Representation Learning on Graphs for Label Noise Tolerance"
☆20Aug 14, 2024Updated last year
RainBowLuoCS / DEEM
View on GitHub
(ICLR 2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.
☆51Jul 1, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
yifeiwang77 / Self-Correction
View on GitHub
☆20Nov 3, 2024Updated last year
thu-coai / ComplexBench
View on GitHub
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
☆102Feb 20, 2025Updated last year
locuslab / scaling_laws_data_filtering
View on GitHub
☆64Apr 9, 2024Updated 2 years ago
TemporaryLoRA / FreeLM
View on GitHub
☆15Feb 10, 2026Updated 5 months ago
HarlynDN / WebCiteS
View on GitHub
[ACL'24] WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations
☆13Sep 11, 2024Updated last year
hanxuhu / SeqIns
View on GitHub
The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…
☆30Nov 24, 2024Updated last year
xiaoboxia / PICMM
View on GitHub
NeurIPS'2022: Pluralistic Image Completion with Gaussian Mixture Models
☆14Jan 28, 2023Updated 3 years ago
Zanette-Labs / SpeculativeRejection
View on GitHub
[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection
☆56Oct 29, 2024Updated last year
skzhang1 / IDEAL
View on GitHub
IDEAL: Influence-Driven Selective Annotations Empower In-Context Learners in Large Language Models
☆59Jan 19, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
sail-sg / sdft
View on GitHub
[ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".
☆167Nov 2, 2024Updated last year
GAIR-NLP / ReAlign
View on GitHub
Reformatted Alignment
☆111Sep 23, 2024Updated last year
facebookresearch / Multi-IF
View on GitHub
The evaluation code for MultiIF multi-turn and multi-lingual instruction following
☆63Oct 29, 2024Updated last year
Justherozen / FreeAL
View on GitHub
[EMNLP 2023] FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models
☆98Dec 21, 2023Updated 2 years ago
RoyalSkye / ATCL
View on GitHub
[NeurIPS 2022] "Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks"
☆13Nov 11, 2022Updated 3 years ago
nasosger / MuToR
View on GitHub
[NeurIPS '25] Multi-Token Prediction Needs Registers
☆30Dec 14, 2025Updated 7 months ago
Tim-Siu / reinforcement-distillation
View on GitHub
Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"
☆33Jul 25, 2025Updated last year