BestAnHongjun / SentenceVAELinks
Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context
☆41Updated last year
Alternatives and similar repositories for SentenceVAE
Users that are interested in SentenceVAE are comparing it to the libraries listed below
Sorting:
- Code for paper "Patch-Level Training for Large Language Models"☆96Updated last month
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Updated last year
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆56Updated 10 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆39Updated last year
- ☆114Updated 3 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆34Updated last year
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆105Updated 6 months ago
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆52Updated last year
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆43Updated 9 months ago
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"☆83Updated last year
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆107Updated 2 months ago
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆80Updated 2 months ago
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆118Updated 7 months ago
- [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models☆69Updated last year
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆188Updated 9 months ago
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆126Updated 8 months ago
- Long Context Extension and Generalization in LLMs☆62Updated last year
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆56Updated 6 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆112Updated 11 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆78Updated last year
- ☆85Updated last month
- FocusLLM: Scaling LLM’s Context by Parallel Decoding☆44Updated last year
- ☆108Updated 3 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆70Updated 9 months ago
- ☆53Updated 10 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]☆180Updated 5 months ago
- ☆57Updated last year
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆126Updated 11 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search" [EMNLP25]☆35Updated 3 months ago
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆60Updated 6 months ago