ZifanL/TSDS

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ZifanL/TSDS)

ZifanL / TSDS

Implementation of TSDS: Data Selection for Task-Specific Model Finetuning. An optimal-transport framework for selecting domain-specific and task-specific training data to improve LLM finetuning and instruction tuning.

☆19

Alternatives and similar repositories for TSDS

Users that are interested in TSDS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

vita-epfl / TAROT
View on GitHub
Official pytorch implementation of ICML2025 "TAROT: Targeted Data Selection via Optimal Transport"
☆31Dec 12, 2024Updated last year
gszfwsb / Data-Whisperer
View on GitHub
Code for ACL 2025 Main paper "Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning…
☆53Aug 4, 2025Updated 11 months ago
oriyor / turning_tables
View on GitHub
Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…
☆22Nov 2, 2021Updated 4 years ago
kohpangwei / data-poisoning-journal-release
View on GitHub
☆18Sep 29, 2020Updated 5 years ago
pldlgb / nuggets
View on GitHub
☆89Dec 29, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
jtonglet / Numerical-Hybrid-QA-Literature
View on GitHub
A list of Numerical Multimodal reasoning papers and their implementation
☆11May 13, 2024Updated 2 years ago
yakuza8 / first-order-predicate-logic-theorem-prover
View on GitHub
Autonomous Theorem Prover for First Order Predicate Logic
☆12Jun 29, 2020Updated 6 years ago
Bai-YT / AdaptiveSmoothing
View on GitHub
Implementation of the paper "Improving the Accuracy-Robustness Trade-off of Classifiers via Adaptive Smoothing".
☆10Feb 6, 2024Updated 2 years ago
earth2observe / downscaling-tools
View on GitHub
python programs and procedures that facilitate local application of the earth2observe global water resources reanalysis
☆10Nov 21, 2017Updated 8 years ago
microsoft / SuperRL
View on GitHub
☆15Sep 8, 2025Updated 10 months ago
JTWang2000 / NICE
View on GitHub
NICE: Non-differentiable evaluation metric-based InfluenCe Estimation
☆16Jul 7, 2025Updated last year
keep-smile-001 / opentqa
View on GitHub
opentqa is a open framework of the textbook question answering, which includes xtqa, mcan, cmr, mfb, mutan.
☆11Mar 27, 2021Updated 5 years ago
haizhongzheng / Coverage-centric-coreset-selection
View on GitHub
☆42Sep 21, 2023Updated 2 years ago
alon-albalak / data-selection-survey
View on GitHub
A Survey on Data Selection for Language Models
☆260Apr 29, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zhongwanjun / CARP
View on GitHub
code for the table-based open domain question answering project, with paper title: "Reasoning over Hybrid Chain for Table-and-Text Open D…
☆12Sep 16, 2022Updated 3 years ago
mandyyyyii / east
View on GitHub
☆19Aug 4, 2025Updated 11 months ago
THUKElab / MESED
View on GitHub
[AAAI 2024] MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained Semantic Classes and Hard Negative Entities
☆15Apr 26, 2024Updated 2 years ago
sairin1202 / SciXGen
View on GitHub
Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"
☆13Feb 14, 2022Updated 4 years ago
MadryLab / trak
View on GitHub
A fast, effective data attribution method for neural networks in PyTorch
☆243Nov 18, 2024Updated last year
yilunzhao / RobuT
View on GitHub
Data and code for ACL 2023 paper "RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations"
☆15Feb 8, 2024Updated 2 years ago
zzh-SJTU / CRT-QA
View on GitHub
The official data and code for EMNLP 2023 main conference paper: CRT-QA: A Dataset of Complex Reasoning Question Answering over Tabular D…
☆13May 19, 2025Updated last year
zhongwanjun / ProQA
View on GitHub
The code for paper "ProQA: Structural Prompt-based Pre-training for Unified Question Answering"
☆11Feb 7, 2023Updated 3 years ago
alistairewj / icu-model-transfer
View on GitHub
Evaluating methods to improve model transfer for intensive care unit models
☆16Jul 6, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
adsarwate / mergetex
View on GitHub
Script for merging LaTeX files and stripping comments, in preparation for submission to ArXiV
☆10May 23, 2014Updated 12 years ago
zh1qun / aecid_incremental_clustering
View on GitHub
日志增量聚类算法，用于日志异常检测
☆12Aug 20, 2022Updated 3 years ago
lezhang7 / MOQAGPT
View on GitHub
[EMNLP'2023 Findings] MoqaGPT, for zero-shot multimodal question answering with LLMs
☆13Dec 28, 2024Updated last year
microsoft / compositional-generalization-span-level-attention
View on GitHub
code for the NAACL 2021 paper Compositional Generalization for Neural Semantic Parsing via Span-level Supervised Attention by Microsoft S…
☆12Apr 21, 2023Updated 3 years ago
NielsRogge / tapas_utils
View on GitHub
A package containing utils for the PyTorch version of the Tapas algorithm.
☆11Apr 29, 2021Updated 5 years ago
AILab-UniFI / cte-dataset
View on GitHub
CTE: Contextualized Table Extraction Dataset
☆17Feb 23, 2023Updated 3 years ago
strangeloopcanon / LOOP-Evals
View on GitHub
Logical Operations On Puzzles: Simple Iterative Reasoning Tests for LLMs first through wordgrids
☆18Feb 19, 2025Updated last year
xiye17 / EvalQAExpl
View on GitHub
Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.
☆17Apr 25, 2021Updated 5 years ago
AndreHe02 / rewarding-unlikely-release
View on GitHub
☆15Jun 10, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
p-lambda / dsir
View on GitHub
DSIR large-scale data selection framework for language model training
☆275Apr 7, 2024Updated 2 years ago
BIU-NLP / iFACETSUM
View on GitHub
Corpus exploration platform using advanced tools such as interactive summarization and multi document coreference resolution
☆12Jun 15, 2023Updated 3 years ago
awslabs / durepa-hybrid-qa
View on GitHub
☆12Mar 22, 2024Updated 2 years ago
facebookresearch / worldsense
View on GitHub
WorldSense benchmark for grounded reasoning in language models
☆25Nov 28, 2023Updated 2 years ago
LuLuLuyi / LongHeads
View on GitHub
[EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor
☆32Apr 8, 2024Updated 2 years ago
EliasMei / IPM
View on GitHub
Repo - Paper "Capturing Semantics for Imputation with Pre-trained Language Models." [ICDE 2021]
☆10Mar 13, 2022Updated 4 years ago
amazon-science / wikiwiki-dataset
View on GitHub
☆11May 11, 2022Updated 4 years ago