[ACL 2025 Main] Official Repo for Paper "Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric"
☆40Feb 10, 2026Updated 2 months ago
Alternatives and similar repositories for NovelSum
Users that are interested in NovelSum are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for "Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding" (EMNLP 2020).☆11May 1, 2025Updated last year
- Can Large Language Models Identify Authorship? (EMNLP 2024 Findings)☆13Feb 4, 2025Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆14Aug 8, 2025Updated 9 months ago
- Split bib files for anthology bibliography for overleaf☆11Aug 25, 2024Updated last year
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆91Nov 13, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Original code base for On Pretraining Data Diversity for Self-Supervised Learning☆14Dec 30, 2024Updated last year
- ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy☆15Jul 19, 2021Updated 4 years ago
- LiteLLM model integration for Pydantic AI framework - access 100+ LLM providers through a unified interface☆22Apr 15, 2026Updated 3 weeks ago
- The repository of the ACCV 2024 paper "FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Ge…☆11Jul 28, 2025Updated 9 months ago
- Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination.☆22Jul 18, 2025Updated 9 months ago
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆18Apr 12, 2024Updated 2 years ago
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆87Dec 14, 2023Updated 2 years ago
- ☆14Jul 25, 2025Updated 9 months ago
- Exploration of automated dataset selection approaches at large scales.☆54Mar 4, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Visual Bidirectional Kernelized Network for Visual Question Answering☆11Jul 17, 2017Updated 8 years ago
- Code for the "Long Context Needs Some R&R" paper.☆12Mar 11, 2024Updated 2 years ago
- This repository contains codes for *Sem 2023 paper “Generative Data Augmentation for Aspect Sentiment Quad Prediction”.☆11May 30, 2023Updated 2 years ago
- ☆29Mar 20, 2024Updated 2 years ago
- EANN(Pytorch)☆10Mar 12, 2022Updated 4 years ago
- https://footprints.baulab.info☆18Oct 4, 2024Updated last year
- NIILC QA data☆18Nov 20, 2015Updated 10 years ago
- CITE: A Corpus of Image-Text Discourse Relations☆13Apr 7, 2019Updated 7 years ago
- Code and data for the NAACL 2021 paper: "XFORMAL: A Benchmark for Multilingual Formality Style Transfer"☆12Jun 7, 2021Updated 4 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Code and data for paper "Large language models can rate news outlet credibility"☆13Aug 10, 2024Updated last year
- STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models☆46Apr 23, 2026Updated 2 weeks ago
- Unsupervised diverse image generation via GANs: Partition Guided Mixture of Generative Adversarial Networks☆13Nov 3, 2021Updated 4 years ago
- Code for "HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking"☆96Nov 18, 2025Updated 5 months ago
- Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.☆20Aug 28, 2023Updated 2 years ago
- ☆11Mar 13, 2023Updated 3 years ago
- PyTorch study☆14Oct 16, 2017Updated 8 years ago
- ☆10Apr 24, 2022Updated 4 years ago
- ☆13Jun 16, 2021Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- 蚂蚁金融自然语言处理竞赛。☆10Sep 3, 2018Updated 7 years ago
- ☆10Jun 21, 2021Updated 4 years ago
- A dataset and CLIP baseline for unrepresentative news thumbnail detection (ACL 2022 workshop)☆12May 26, 2022Updated 3 years ago
- Optimization Case Studies: Generic Time Scheduling Problem (GTSP), Resource-Constrained Project Scheduling Problem (RCPSP) with Pulse Var…☆11Nov 7, 2018Updated 7 years ago
- [ACL 2026] Code, benchmark and environment for "OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic…☆47Nov 10, 2025Updated 5 months ago
- A Controllable Model of Grounded Response Generation (AAAI 21)☆13Oct 25, 2022Updated 3 years ago
- Gaussian Processes regression and classification implementations, as well as notebook for accompanying blog. Blog static site:☆11Jul 16, 2018Updated 7 years ago