BestAnHongjun / SentenceVAELinks
Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context
☆41Updated last year
Alternatives and similar repositories for SentenceVAE
Users that are interested in SentenceVAE are comparing it to the libraries listed below
Sorting:
- Code for paper "Patch-Level Training for Large Language Models"☆97Updated 2 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆42Updated 11 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Updated last year
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆41Updated last year
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆126Updated last year
- [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models☆73Updated last year
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆78Updated last year
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"☆84Updated 2 years ago
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆81Updated last month
- Large Language Models Can Self-Improve in Long-context Reasoning☆72Updated last year
- ☆120Updated this week
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆60Updated last year
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆53Updated last year
- ☆53Updated last year
- RewardAnything: Generalizable Principle-Following Reward Models☆45Updated 7 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆113Updated last year
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆132Updated 9 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆114Updated last week
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆158Updated 7 months ago
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆108Updated 8 months ago
- [ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teaches☆60Updated 11 months ago
- ☆85Updated 2 months ago
- ☆53Updated 11 months ago
- Long Context Extension and Generalization in LLMs☆62Updated last year
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆121Updated 8 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Updated 2 years ago
- [EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆68Updated 9 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]☆180Updated 7 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆105Updated last year
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆57Updated 8 months ago