gszfwsb/Data-Whisperer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/gszfwsb/Data-Whisperer)

gszfwsb / Data-Whisperer

Code for ACL 2025 Main paper "Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning".

☆52

Alternatives and similar repositories for Data-Whisperer

Users that are interested in Data-Whisperer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ZifanL / TSDS
View on GitHub
Implementation of TSDS: Data Selection for Task-Specific Model Finetuning. An optimal-transport framework for selecting domain-specific a…
☆19Dec 25, 2024Updated last year
Frostlinx / Socratic-Zero
View on GitHub
Socratic-Zero is a fully autonomous framework that generates high-quality training data for mathematical reasoning
☆38Oct 26, 2025Updated 8 months ago
LiuJi-Jim / smiles
View on GitHub
聊天表情，lol
☆26Dec 2, 2016Updated 9 years ago
VITA-Group / Nabla-Reasoner
View on GitHub
[ICLR'26] "Nabla-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space" by Peihao Wang*, Ruisi Cai*, Zhen Wang, Hongyuan…
☆35Mar 10, 2026Updated 3 months ago
Illyasville / ExpertTokenRouting
View on GitHub
☆11Feb 16, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Astarojth / AgentAuditor-ASSEBench
View on GitHub
☆39May 29, 2026Updated last month
choosewhatulike / cluster-clip
View on GitHub
Multi-GPU supported kmeans clustering for cluser-clip
☆15Jun 3, 2024Updated 2 years ago
hsaest / Agent-Planning-Analysis
View on GitHub
[NAACL'25] "Revealing the Barriers of Language Agents in Planning"
☆13Jun 22, 2025Updated last year
michaelchen-lab / caft-llm
View on GitHub
Improving large language models with concept-aware fine-tuning (CAFT)
☆29Jan 31, 2026Updated 5 months ago
JerryYLi / svitt
View on GitHub
Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"
☆21Jun 16, 2023Updated 3 years ago
adxcreative / EERCF
View on GitHub
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
☆21Feb 19, 2025Updated last year
usccolumbia / tsdnn
View on GitHub
Twin-deep neural network for semi-supervised learning of materials properties
☆12Feb 1, 2023Updated 3 years ago
cai-cong / MER25_personality
View on GitHub
☆21Jun 26, 2025Updated last year
abdelfattah-lab / SplitReason
View on GitHub
☆20Mar 18, 2026Updated 3 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ZichenWen1 / EPIC
View on GitHub
(NeurIPS 2025 🔥) Official implementation for "Efficient Multi-modal Large Language Models via Progressive Consistency Distillation"
☆50Feb 11, 2026Updated 4 months ago
Nebularaid2000 / bottleneck
View on GitHub
PyTorch implementation of the paper "Discovering and Explaining the Representation Bottleneck of DNNs" (ICLR 2022 Oral)
☆37Oct 30, 2024Updated last year
Lilidamowang / T2VIndexer-generativeSearch
View on GitHub
☆16Aug 28, 2024Updated last year
Lyun0912-wu / LongAttn
View on GitHub
LongAttn ：Selecting Long-context Training Data via Token-level Attention
☆15Jul 16, 2025Updated 11 months ago
GuanGui-nju / SAA
View on GitHub
Code and Model For SAA
☆12Sep 21, 2023Updated 2 years ago
ulab-uiuc / Router-R1
View on GitHub
[NeurIPS'25] Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
☆140Dec 30, 2025Updated 6 months ago
zs1314 / Fraesormer
View on GitHub
【ICME2025 Oral】Offical Pytorch Code for "Fraesormer: Learning Adaptive Sparse Transformer for Efficient Food Recognition"
☆13Mar 21, 2025Updated last year
suinleelab / vit-shapley
View on GitHub
Learning to Estimate Shapley Values with Vision Transformers
☆38Mar 4, 2026Updated 4 months ago
ZhuJiwei111 / SPACE
View on GitHub
☆20Dec 20, 2025Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
pm25 / Semi-Supervised-Regression
View on GitHub
[NeurIPS 2024] Official code for the paper 'RankUp: Boosting Semi-Supervised Regression with an Auxiliary Ranking Classifier'
☆14Aug 22, 2025Updated 10 months ago
AI45Lab / DeepSafe
View on GitHub
All-in-One Safety Evaluation Framwork
☆51Updated this week
QizhiPei / MathFusion
View on GitHub
MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)
☆37Jul 16, 2025Updated 11 months ago
InuyashaYang / AIDIY
View on GitHub
JoinAI是一个开源仓库，专注于算法工程能力的培养，包括工程和数学原理的整理
☆11Apr 20, 2025Updated last year
ningliu-iga / TrinityLLM
View on GitHub
Large language models, physics-based modeling, experimental measurements: the trinity of data-scarce learning of polymer properties
☆14Sep 4, 2025Updated 10 months ago
keeganhines / snowman
View on GitHub
☆12Jun 24, 2017Updated 9 years ago
supersupercong / MSGNN
View on GitHub
[IJCAI-24] Explore Internal and External Similarity for Single Image Deraining with Graph Neural Networks
☆11Sep 2, 2024Updated last year
AI45Lab / IS-Bench
View on GitHub
[AAAI 2026] Data and Code for Paper IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
☆46Nov 24, 2025Updated 7 months ago
alon-albalak / online-data-mixing
View on GitHub
An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.
☆14Jan 9, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
AheadOFpotato / Awesome-LRM-Mechanisms
View on GitHub
Towards a Mechanistic Understanding of Large Reasoning Models: A Survey of Training, Inference, and Failures
☆34Jan 29, 2026Updated 5 months ago
CodeLLM-Research / CodeJudge-Eval
View on GitHub
[COLING25] CodeJudge Eval: Can Large Language Models be Good Judges in Code Understanding?
☆12Dec 3, 2024Updated last year
MobiSys25AE / SynCheck
View on GitHub
Artifact evaluation of MobiSys25 SynCheck
☆20Mar 24, 2025Updated last year
ZichenWen1 / DART
View on GitHub
[EMNLP 2025 main 🔥] Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"
☆121Oct 12, 2025Updated 8 months ago
sooonwoo / CL-Baselines
View on GitHub
This is a Pytorch implementation of contrastive Learning(CL) baselines.
☆14Aug 29, 2022Updated 3 years ago
Murf-y / Attractors-Simulation
View on GitHub
Multiple Attractors simulation with customization
☆14Feb 22, 2026Updated 4 months ago
OpenSPG / KAG-Thinker
View on GitHub
An interactive thinking and deep reasoning model. It provides a cognitive reasoning paradigm for complex multi-hop problems.
☆83Nov 14, 2025Updated 7 months ago