sail-sg / scaling-with-vocab
π Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
β52Updated 3 weeks ago
Related projects: β
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".β43Updated last week
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"β59Updated 7 months ago
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoningβ24Updated last month
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewardsβ33Updated last month
- Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"β65Updated 6 months ago
- Directional Preference Alignmentβ44Updated 3 months ago
- Official implementation for the paper *π―DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*β57Updated 3 weeks ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"β62Updated 3 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMsβ46Updated 5 months ago
- Code and data for the paper "Finding Transformer Circuits with Edge Pruning".β20Updated 2 months ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignmentβ45Updated 3 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).β15Updated this week
- β13Updated last month
- Online Adaptation of Language Models with a Memory of Amortized Contextsβ51Updated last month
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073β24Updated 2 months ago
- β46Updated 2 weeks ago
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024β20Updated 2 months ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"β28Updated 8 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":β31Updated 5 months ago
- β14Updated 6 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswalβ¦β42Updated last year
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Modelsβ42Updated last week
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"β44Updated 8 months ago
- β80Updated 9 months ago
- β30Updated last month
- β42Updated 5 months ago
- Self-Explore to avoid οΈthe pοΈοΈit! Improving the Reasoning Capabilities of Language Models with Fine-grained Rewardsβ39Updated 4 months ago
- β44Updated 11 months ago
- Code and data for the benchmark "Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Lanβ¦β30Updated 2 months ago
- Knowledge Circuits in Pretrained Transformersβ46Updated this week