[ACL2025 Findings] Official code for MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
☆28Aug 30, 2025Updated 9 months ago
Alternatives and similar repositories for MIG
Users that are interested in MIG are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official code for paper: Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language☆31Feb 28, 2025Updated last year
- arXiv 2024 | ZIP: entropy-law data selection for efficient LLM alignment.☆28Jun 10, 2026Updated last week
- PyTorch code for Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles (DANCE)☆23Nov 29, 2022Updated 3 years ago
- This is the official implementation of TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data☆13Jul 21, 2024Updated last year
- ☆29Mar 10, 2026Updated 3 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆91Nov 13, 2024Updated last year
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆284Aug 20, 2023Updated 2 years ago
- 基于SSM的驾校预约管理系统1拥有三种角色,分别为管理员、教练、学员,具体功能如下: 管理员:学员管理、教练管理、驾校车辆管理、预约管理、取消预约管理、公告管理 教练:教练信息查询、预约管理、取消预约管理、注册、个人中心 学员:查看教练信息、预约教练、取消预约教练、评…☆13Jan 11, 2024Updated 2 years ago
- Danmuku dataset☆12Jul 7, 2023Updated 2 years ago
- Application and blog explaining my interpretations of In-run Data Shapley☆31Jan 30, 2025Updated last year
- CVPR2022:Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency☆18Aug 10, 2022Updated 3 years ago
- 本项目是一款管理驾校和方便学员预约学车的系统☆15Dec 19, 2017Updated 8 years ago
- ☆18Oct 2, 2024Updated last year
- [CVPR 2021] This repository is the official implementation of "PML: Progressive Margin Loss for Long-tailed Age Classification."☆17Mar 13, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆13Apr 18, 2024Updated 2 years ago
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆159Sep 27, 2024Updated last year
- Code for "In-Context Former: Lightning-fast Compressing Context for Large Language Model" (Findings of EMNLP 2024)☆21Nov 21, 2024Updated last year
- It is the implementation of paper "Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model"☆18Feb 19, 2021Updated 5 years ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆414Jun 25, 2025Updated 11 months ago
- ☆33Mar 9, 2022Updated 4 years ago
- ☆21Jul 25, 2025Updated 10 months ago
- WWW 2025 | FuXi-alpha: feature-interaction enhanced Transformer for scalable generative recommendation.☆24Jun 10, 2026Updated last week
- ☆26Aug 24, 2022Updated 3 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆29Nov 27, 2021Updated 4 years ago
- ☆13Dec 25, 2018Updated 7 years ago
- Codes of BaiLian (POJ), Luogu, LeetCode & Course OJ☆16Dec 21, 2019Updated 6 years ago
- Official PyTorch implementation of the paper "Equivariant Image Modeling"(https://arxiv.org/abs/2503.18948)☆36Aug 1, 2025Updated 10 months ago
- Master the techniques of function-calling and structured data extraction with LLMs. Learn to enhance LLM capabilities, integrate web serv…☆12Jun 29, 2024Updated last year
- repo for paper https://arxiv.org/abs/2504.13837☆341Dec 17, 2025Updated 6 months ago
- [COLM 2025] EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees☆31Jul 11, 2025Updated 11 months ago
- [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models☆78Nov 23, 2024Updated last year
- A retrieve and edit approach to generate sarcasm by reversing valence and adding incongruent common sense context☆32Mar 27, 2021Updated 5 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ACL2023] Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference☆24Dec 25, 2023Updated 2 years ago
- ☆12Dec 13, 2023Updated 2 years ago
- 2020语言与智能技术竞赛:关系抽取任务☆10Mar 19, 2020Updated 6 years ago
- ☆11Nov 27, 2018Updated 7 years ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆39Mar 4, 2024Updated 2 years ago
- ConvGQR: Generative Query Reformulation for Conversational Search. A codebase for ACL 2023 accepted paper.☆35Mar 5, 2024Updated 2 years ago
- Informative Conversational Query Rewriting☆39Jan 29, 2024Updated 2 years ago