[ACL 2025 Main] Official Repo for Paper "Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric"
☆41Feb 10, 2026Updated 4 months ago
Alternatives and similar repositories for NovelSum
Users that are interested in NovelSum are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆32Oct 9, 2025Updated 8 months ago
- Complete set of English dialect transformation rules and evaluation code☆17Jun 7, 2024Updated 2 years ago
- Can Large Language Models Identify Authorship? (EMNLP 2024 Findings)☆13Feb 4, 2025Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆14Aug 8, 2025Updated 10 months ago
- Split bib files for anthology bibliography for overleaf☆11Aug 25, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆91Nov 13, 2024Updated last year
- ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy☆15Jul 19, 2021Updated 4 years ago
- 基于pytorch的不平衡数据的文本分类☆12Dec 26, 2021Updated 4 years ago
- Tools and prompt templates used to build and evaluate SWE-rebench-v2 tasks for the paper.☆63Mar 12, 2026Updated 3 months ago
- ☆30May 6, 2026Updated last month
- ☯️ AllenNLP training configurations for promising models on Named Entity Recognition. (BiLSTM-CRF, BiLSTM-CNN-CRF, BERT, BERT-CRF)☆15Nov 26, 2020Updated 5 years ago
- Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.☆22Jul 18, 2025Updated 11 months ago
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆17Apr 12, 2024Updated 2 years ago
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆87Dec 14, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- An archive of the storygames forum☆13Jan 18, 2021Updated 5 years ago
- Visual Bidirectional Kernelized Network for Visual Question Answering☆11Jul 17, 2017Updated 8 years ago
- Code for the "Long Context Needs Some R&R" paper.☆12Mar 11, 2024Updated 2 years ago
- This repository contains codes for *Sem 2023 paper “Generative Data Augmentation for Aspect Sentiment Quad Prediction”.☆10May 30, 2023Updated 3 years ago
- A code for "Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking" in ACL2…☆28Jul 27, 2019Updated 6 years ago
- ☆29Mar 20, 2024Updated 2 years ago
- ☆12Jan 7, 2020Updated 6 years ago
- EANN(Pytorch)☆10Mar 12, 2022Updated 4 years ago
- ☆28Jul 18, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- https://footprints.baulab.info☆18Oct 4, 2024Updated last year
- ☆69Jun 11, 2026Updated last week
- CITE: A Corpus of Image-Text Discourse Relations☆13Apr 7, 2019Updated 7 years ago
- [COLING 2025] Official Repo for Paper "Beyond Boundaries: Learning Universal Entity Taxonomy across Datasets and Languages for Open Named…☆28Feb 5, 2026Updated 4 months ago
- Code and data for the NAACL 2021 paper: "XFORMAL: A Benchmark for Multilingual Formality Style Transfer"☆12Jun 7, 2021Updated 5 years ago
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.☆30Jun 12, 2023Updated 3 years ago
- STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models☆49Apr 23, 2026Updated last month
- Unsupervised diverse image generation via GANs: Partition Guided Mixture of Generative Adversarial Networks☆13Nov 3, 2021Updated 4 years ago
- We enable LLM with personalization capability☆11Nov 16, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.☆20Aug 28, 2023Updated 2 years ago
- Fuzzy Aggregators and Similarity Into a Logic Language☆26Sep 12, 2024Updated last year
- ☆13Jun 16, 2021Updated 5 years ago
- 擂台赛3-大规模预训练调优比赛的示例代码与baseline实现☆37Sep 27, 2022Updated 3 years ago
- 蚂蚁金融自然语言处理竞赛。☆10Sep 3, 2018Updated 7 years ago
- ☆10Jun 21, 2021Updated 4 years ago
- Optimization Case Studies: Generic Time Scheduling Problem (GTSP), Resource-Constrained Project Scheduling Problem (RCPSP) with Pulse Var…☆11Nov 7, 2018Updated 7 years ago