[ACL 2025 Main] Official Repo for Paper "Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric"
☆39Feb 10, 2026Updated 2 months ago
Alternatives and similar repositories for NovelSum
Users that are interested in NovelSum are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for "Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding" (EMNLP 2020).☆11May 1, 2025Updated 11 months ago
- Complete set of English dialect transformation rules and evaluation code☆16Jun 7, 2024Updated last year
- Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"☆13Nov 26, 2024Updated last year
- Split bib files for anthology bibliography for overleaf☆11Aug 25, 2024Updated last year
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆91Nov 13, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Original code base for On Pretraining Data Diversity for Self-Supervised Learning☆14Dec 30, 2024Updated last year
- ACL 2021 paper "Style is NOT a single variable: Case Studies for Cross-Style Language Understanding " by Dongyeop Kang and Eduard Hovy☆15Jul 19, 2021Updated 4 years ago
- The repository of the ACCV 2024 paper "FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Ge…☆11Jul 28, 2025Updated 8 months ago
- ☆26Sep 26, 2025Updated 6 months ago
- Dialogue Act classification☆18Jan 15, 2024Updated 2 years ago
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆18Apr 12, 2024Updated 2 years ago
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆87Dec 14, 2023Updated 2 years ago
- ☆12Nov 9, 2018Updated 7 years ago
- Exploration of automated dataset selection approaches at large scales.☆54Mar 4, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code for the "Long Context Needs Some R&R" paper.☆12Mar 11, 2024Updated 2 years ago
- ☆29Mar 20, 2024Updated 2 years ago
- EANN(Pytorch)☆10Mar 12, 2022Updated 4 years ago
- ☆27Jul 18, 2025Updated 9 months ago
- https://footprints.baulab.info☆18Oct 4, 2024Updated last year
- [COLING 2025] Official Repo for Paper "Beyond Boundaries: Learning Universal Entity Taxonomy across Datasets and Languages for Open Named…☆28Feb 5, 2026Updated 2 months ago
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.☆30Jun 12, 2023Updated 2 years ago
- Code and data for paper "Large language models can rate news outlet credibility"☆13Aug 10, 2024Updated last year
- Unsupervised diverse image generation via GANs: Partition Guided Mixture of Generative Adversarial Networks☆13Nov 3, 2021Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Code for "HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking"☆94Nov 18, 2025Updated 5 months ago
- We enable LLM with personalization capability☆11Nov 16, 2023Updated 2 years ago
- Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.☆19Aug 28, 2023Updated 2 years ago
- ☆11Mar 13, 2023Updated 3 years ago
- PyTorch study☆14Oct 16, 2017Updated 8 years ago
- ☆10Apr 24, 2022Updated 3 years ago
- Fuzzy Aggregators and Similarity Into a Logic Language☆26Sep 12, 2024Updated last year
- ☆13Jun 16, 2021Updated 4 years ago
- 擂台赛3-大规模预训练调优比赛的示例代码与baseline实现☆37Sep 27, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 一些常见攻击算法的实现☆16May 13, 2021Updated 4 years ago
- Optimization Case Studies: Generic Time Scheduling Problem (GTSP), Resource-Constrained Project Scheduling Problem (RCPSP) with Pulse Var…☆11Nov 7, 2018Updated 7 years ago
- [ACL 2026] Code, benchmark and environment for "OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic…☆45Nov 10, 2025Updated 5 months ago
- A Controllable Model of Grounded Response Generation (AAAI 21)☆13Oct 25, 2022Updated 3 years ago
- Optimizing Deeper Transformers on Small Datasets https://arxiv.org/abs/2012.15355☆16Nov 2, 2022Updated 3 years ago
- [EMNLP 2020] Collective HumAn OpinionS on Natural Language Inference Data☆40Apr 7, 2022Updated 4 years ago
- Code for Sat2Cap model (Earthvision Best Paper Award)☆17Jul 23, 2024Updated last year