Implementation of TSDS: Data Selection for Task-Specific Model Finetuning. An optimal-transport framework for selecting domain-specific and task-specific training data to improve LLM finetuning and instruction tuning.
☆19Dec 25, 2024Updated last year
Alternatives and similar repositories for TSDS
Users that are interested in TSDS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code repository for EMNLP 2021 paper 'Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods'☆16Oct 13, 2022Updated 3 years ago
- The repository for paper <Evaluating Open-QA Evaluation>☆25Apr 9, 2024Updated 2 years ago
- Code for ACL 2025 Main paper "Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning…☆52Aug 4, 2025Updated 11 months ago
- This is the official implementation of TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data☆13Jul 21, 2024Updated last year
- ☆89Dec 29, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Improving large language models with concept-aware fine-tuning (CAFT)☆29Jan 31, 2026Updated 5 months ago
- Official repository for Activation-Informed Merging (AIM) of Large Language Models☆24Feb 10, 2025Updated last year
- We systematically studied the influencing factors when LLM generates benchmarks,By using our code, you can generate high-quality QA datas…☆20May 20, 2025Updated last year
- The official implemention of "Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration" (ICML 2026)☆24Feb 4, 2026Updated 5 months ago
- Establish signal geometrical model to locate the target. Use particle swarm optimization(PSO) to solver a overdetermined equation. Use …☆10Apr 12, 2018Updated 8 years ago
- [AAAI 2024] MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained Semantic Classes and Hard Negative Entities☆15Apr 26, 2024Updated 2 years ago
- Tri-Layer Local Contrast Measure (TLLCM) for small infrared target detection☆12Oct 3, 2020Updated 5 years ago
- ☆19Aug 4, 2025Updated 10 months ago
- python programs and procedures that facilitate local application of the earth2observe global water resources reanalysis☆10Nov 21, 2017Updated 8 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆188Jul 23, 2025Updated 11 months ago
- ☆24Oct 14, 2024Updated last year
- Less is More: High-value Data Selection for Visual Instruction Tuning☆19Jan 18, 2025Updated last year
- The code for paper "ProQA: Structural Prompt-based Pre-training for Unified Question Answering"☆11Feb 7, 2023Updated 3 years ago
- A Survey on Data Selection for Language Models☆261Apr 29, 2025Updated last year
- ☆42Sep 21, 2023Updated 2 years ago
- WorldSense benchmark for grounded reasoning in language models☆25Nov 28, 2023Updated 2 years ago
- opentqa is a open framework of the textbook question answering, which includes xtqa, mcan, cmr, mfb, mutan.☆11Mar 27, 2021Updated 5 years ago
- Just share some data☆13Dec 31, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆17Jun 5, 2018Updated 8 years ago
- This is the github to open source benchmark AdvancedIF, see LAMA L1387358RCRO☆36Nov 26, 2025Updated 7 months ago
- code for the table-based open domain question answering project, with paper title: "Reasoning over Hybrid Chain for Table-and-Text Open D…☆12Sep 16, 2022Updated 3 years ago
- Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"☆13Feb 14, 2022Updated 4 years ago
- Data and code for ACL 2023 paper "RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations"☆15Feb 8, 2024Updated 2 years ago
- 完整的原版transformer程序,complete origin transformer program☆22Mar 5, 2025Updated last year
- The official data and code for EMNLP 2023 main conference paper: CRT-QA: A Dataset of Complex Reasoning Question Answering over Tabular D…☆13May 19, 2025Updated last year
- Script for merging LaTeX files and stripping comments, in preparation for submission to ArXiV☆10May 23, 2014Updated 12 years ago
- code for the NAACL 2021 paper Compositional Generalization for Neural Semantic Parsing via Span-level Supervised Attention by Microsoft S…☆12Apr 21, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A list of Numerical Multimodal reasoning papers and their implementation☆11May 13, 2024Updated 2 years ago
- [EMNLP'2023 Findings] MoqaGPT, for zero-shot multimodal question answering with LLMs☆13Dec 28, 2024Updated last year
- Official implementation of the paper "ALTER: Augmentation for Large-Table-Based Reasoning"☆15Aug 26, 2024Updated last year
- CTE: Contextualized Table Extraction Dataset☆17Feb 23, 2023Updated 3 years ago
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆32Apr 8, 2024Updated 2 years ago
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.☆17Apr 25, 2021Updated 5 years ago
- 3D extension of a Gabor filter☆17Dec 6, 2018Updated 7 years ago