codefuse-ai / Collinear-Constrained-Attention
☆58Updated 3 months ago
Related projects: ⓘ
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆45Updated last year
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆121Updated 3 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆133Updated 3 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆101Updated 2 weeks ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆114Updated 2 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆72Updated 6 months ago
- An Experiment on Dynamic NTK Scaling RoPE☆59Updated 9 months ago
- ☆32Updated 3 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆21Updated 3 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆68Updated 11 months ago
- ☆57Updated 3 weeks ago
- ☆82Updated 5 months ago
- Official completion of “Training on the Benchmark Is Not All You Need”.☆18Updated last week
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆81Updated this week
- ☆52Updated 2 months ago
- [SIGIR'24] The official implementation code of MOELoRA.☆113Updated 2 months ago
- ☆87Updated 4 months ago
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆38Updated 2 weeks ago
- ☆105Updated last week
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆37Updated 6 months ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆30Updated last year
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting☆60Updated 6 months ago
- ☆75Updated 5 months ago
- 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training☆79Updated this week
- Unofficial implementation of AlpaGasus☆83Updated 11 months ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆17Updated last month
- Converting Mixtral-8x7B to Mixtral-[1~7]x7B☆20Updated 6 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆32Updated 8 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆54Updated 6 months ago
- LongAlign: A Recipe for Long Context Alignment Encompassing Data, Training, and Evaluation☆194Updated 4 months ago