codefuse-ai / Collinear-Constrained-Attention
☆62Updated 9 months ago
Alternatives and similar repositories for Collinear-Constrained-Attention:
Users that are interested in Collinear-Constrained-Attention are comparing it to the libraries listed below
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆129Updated 9 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated last year
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 2 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆70Updated last year
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆30Updated 9 months ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated 11 months ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆55Updated last year
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆87Updated this week
- An Experiment on Dynamic NTK Scaling RoPE☆62Updated last year
- ☆98Updated 5 months ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆32Updated last year
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆116Updated 4 months ago
- NaturalCodeBench (Findings of ACL 2024)☆62Updated 5 months ago
- ☆45Updated 9 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆36Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆45Updated 2 months ago
- ☆81Updated 11 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆246Updated 3 months ago
- Code implementation of synthetic continued pretraining☆95Updated 2 months ago
- ☆100Updated 11 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆152Updated 9 months ago
- [ICLR 2025] PEARL: parallel speculative decoding with adaptive draft length☆59Updated last week
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆40Updated 4 months ago
- Repository of LV-Eval Benchmark☆59Updated 6 months ago
- Unofficial implementation of AlpaGasus☆90Updated last year
- [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models☆34Updated 4 months ago