OpenCoder-llm / opc_data_filteringLinks
Heuristic filtering framework for RefineCode
β82Updated 8 months ago
Alternatives and similar repositories for opc_data_filtering
Users that are interested in opc_data_filtering are comparing it to the libraries listed below
Sorting:
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialoguesβ130Updated last year
- [ICLR 2025] 𧬠RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)β181Updated 9 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuningβ184Updated 5 months ago
- β316Updated last year
- [ICML 2024] Selecting High-Quality Data for Training Language Modelsβ194Updated last year
- β108Updated 4 months ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuningβ284Updated 2 years ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodingsβ167Updated last year
- Fantastic Data Engineering for Large Language Modelsβ92Updated 11 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMsβ257Updated 11 months ago
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Modelsβ117Updated 6 months ago
- CFBench: A Comprehensive Constraints-Following Benchmark for LLMsβ44Updated last year
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)β97Updated 9 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Modelsβ268Updated last year
- Collection of papers for scalable automated alignment.β94Updated last year
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Modelsβ192Updated last year
- Code implementation of synthetic continued pretrainingβ142Updated 11 months ago
- a-m-team's exploration in large language modelingβ194Updated 6 months ago
- A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarksβ269Updated last year
- β146Updated last year
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.β253Updated last year
- Codes for the paper "βBench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718β360Updated last year
- A Comprehensive Survey on Long Context Language Modelingβ213Updated 2 weeks ago
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Modelsβ45Updated last year
- The official repo of INF-34B models trained by INF Technology.β34Updated last year
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.β248Updated 7 months ago
- β213Updated 9 months ago
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimationβ90Updated last year
- The related works and background techniques about Openai o1β221Updated 11 months ago
- [ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMβ¦β68Updated last year