[ACL 2025 Main] Official Repo for Paper "Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric"
☆42Feb 10, 2026Updated 3 months ago
Alternatives and similar repositories for NovelSum
Users that are interested in NovelSum are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Prover Agent: An Agent-Based Framework for Formal Mathematical Proofs☆24Nov 1, 2025Updated 6 months ago
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆32Oct 9, 2025Updated 7 months ago
- Code for "Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding" (EMNLP 2020).☆11May 1, 2025Updated last year
- Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"☆13Nov 26, 2024Updated last year
- Can Large Language Models Identify Authorship? (EMNLP 2024 Findings)☆13Feb 4, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆14Aug 8, 2025Updated 9 months ago
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆91Nov 13, 2024Updated last year
- Original code base for On Pretraining Data Diversity for Self-Supervised Learning☆14Dec 30, 2024Updated last year
- ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy☆15Jul 19, 2021Updated 4 years ago
- ☆29May 6, 2026Updated 3 weeks ago
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆17Apr 12, 2024Updated 2 years ago
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆87Dec 14, 2023Updated 2 years ago
- ☆12Nov 9, 2018Updated 7 years ago
- The code for Template-GPT-2 Generation Model for Logic2Text Dataset☆18Jun 1, 2020Updated 5 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- PyTorch implementation of PtrNet to solve sorting problem.☆12Dec 19, 2017Updated 8 years ago
- Exploration of automated dataset selection approaches at large scales.☆54Mar 4, 2025Updated last year
- Code for the "Long Context Needs Some R&R" paper.☆12Mar 11, 2024Updated 2 years ago
- ☆11Mar 10, 2017Updated 9 years ago
- ☆12Jan 7, 2020Updated 6 years ago
- https://footprints.baulab.info☆18Oct 4, 2024Updated last year
- CITE: A Corpus of Image-Text Discourse Relations☆13Apr 7, 2019Updated 7 years ago
- [COLING 2025] Official Repo for Paper "Beyond Boundaries: Learning Universal Entity Taxonomy across Datasets and Languages for Open Named…☆28Feb 5, 2026Updated 3 months ago
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.☆30Jun 12, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Code and data for paper "Large language models can rate news outlet credibility"☆13Aug 10, 2024Updated last year
- Unsupervised diverse image generation via GANs: Partition Guided Mixture of Generative Adversarial Networks☆13Nov 3, 2021Updated 4 years ago
- We enable LLM with personalization capability☆11Nov 16, 2023Updated 2 years ago
- ☆11Mar 13, 2023Updated 3 years ago
- PyTorch study☆14Oct 16, 2017Updated 8 years ago
- ☆10Apr 24, 2022Updated 4 years ago
- Fuzzy Aggregators and Similarity Into a Logic Language☆26Sep 12, 2024Updated last year
- ☆13Jun 16, 2021Updated 4 years ago
- 蚂蚁金融自然语言处理竞赛。☆10Sep 3, 2018Updated 7 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆10Jun 21, 2021Updated 4 years ago
- A dataset and CLIP baseline for unrepresentative news thumbnail detection (ACL 2022 workshop)☆12May 26, 2022Updated 4 years ago
- Optimization Case Studies: Generic Time Scheduling Problem (GTSP), Resource-Constrained Project Scheduling Problem (RCPSP) with Pulse Var…☆11Nov 7, 2018Updated 7 years ago
- This is the source code of IJCNN 2023 paper TieFake: Title-Text Similarity and Emotion-Aware Fake News Detection (TieFake).☆16Dec 21, 2023Updated 2 years ago
- Code for Sat2Cap model (Earthvision Best Paper Award)☆17Updated this week
- [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models☆77Nov 23, 2024Updated last year
- Entity-Aware Dual Co-Attention Network for Fake News Detection, EACL 2023 Findings☆10Jun 11, 2023Updated 2 years ago