GAIR-NLP / ProX
Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"
☆234Updated last month
Alternatives and similar repositories for ProX:
Users that are interested in ProX are comparing it to the libraries listed below
- ☆265Updated 8 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆249Updated 3 months ago
- Reformatted Alignment☆115Updated 6 months ago
- A Comprehensive Survey on Long Context Language Modeling☆126Updated 2 weeks ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆181Updated 6 months ago
- ☆313Updated 6 months ago
- Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718☆316Updated 6 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆147Updated 7 months ago
- ☆93Updated 3 months ago
- ☆272Updated 3 weeks ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆137Updated 5 months ago
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆122Updated 5 months ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆457Updated last year
- ☆134Updated last month
- [Neurips2024] Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token☆133Updated 9 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆170Updated this week
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆250Updated last year
- ☆148Updated 3 months ago
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆181Updated last year
- ☆151Updated last week
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆157Updated this week
- ☆142Updated 9 months ago
- Code implementation of synthetic continued pretraining☆99Updated 3 months ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆129Updated 8 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆196Updated 11 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆175Updated 3 weeks ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆166Updated 3 weeks ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆153Updated 10 months ago
- [ACL 2024] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning☆219Updated 3 months ago
- ☆182Updated last month