Infini-AI-Lab / S2FT
☆16Updated 2 months ago
Alternatives and similar repositories for S2FT:
Users that are interested in S2FT are comparing it to the libraries listed below
- Codebase for Instruction Following without Instruction Tuning☆33Updated 5 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆24Updated 4 months ago
- ☆76Updated 2 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆44Updated last month
- Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."☆11Updated 3 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆80Updated 5 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆61Updated 3 weeks ago
- Long Context Extension and Generalization in LLMs☆50Updated 6 months ago
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆35Updated 9 months ago
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆22Updated last week
- The official implementation of Self-Exploring Language Models (SELM)☆62Updated 9 months ago
- ☆30Updated 2 months ago
- A repository for research on medium sized language models.☆76Updated 9 months ago
- The official code repo and data hub of top_nsigma sampling strategy for LLMs.☆23Updated last month
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆47Updated 2 weeks ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 6 months ago
- ☆76Updated this week
- Repository for the paper: 500xCompressor: Generalized Prompt Compression for Large Language Models☆30Updated 7 months ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆45Updated last month
- ☆73Updated 7 months ago
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆63Updated 2 weeks ago