DAMO-NLP-SG / CLEXLinks
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models
β78Updated last year
Alternatives and similar repositories for CLEX
Users that are interested in CLEX are comparing it to the libraries listed below
Sorting:
- β102Updated 9 months ago
- [ICLR 2025] 𧬠RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)β149Updated 4 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodingsβ155Updated last year
- [ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.β63Updated 8 months ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scalesβ32Updated last year
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free toβ¦β56Updated 2 years ago
- Code for paper "Patch-Level Training for Large Language Models"β85Updated 7 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":β38Updated last year
- [ICML'24] The official implementation of βRethinking Optimization and Architecture for Tiny Language Modelsββ121Updated 5 months ago
- [ICML 2024] Selecting High-Quality Data for Training Language Modelsβ178Updated last year
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"β58Updated last year
- [NeurIPS-2024] π Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623β86Updated 9 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]β138Updated 9 months ago
- Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"β74Updated last month
- [ACL 2025, Main Conference, Oral] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Processβ28Updated 11 months ago
- Reformatted Alignmentβ113Updated 9 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"β91Updated last month
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"β211Updated 4 months ago
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignmentβ75Updated last year
- One Network, Many Masks: Towards More Parameter-Efficient Transfer Learningβ40Updated 2 years ago
- β18Updated 7 months ago
- An Experiment on Dynamic NTK Scaling RoPEβ64Updated last year
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejectionβ47Updated 8 months ago
- Unofficial implementation of AlpaGasusβ92Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuningβ160Updated 2 weeks ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modelingβ50Updated last month
- The code and data for the paper JiuZhang3.0β47Updated last year
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"β147Updated 3 months ago
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"β77Updated 7 months ago
- Towards Systematic Measurement for Long Text Qualityβ36Updated 10 months ago