BBuf / RWKV-World-HF-Tokenizer
☆33Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for RWKV-World-HF-Tokenizer
- FuseAI Project☆76Updated 2 months ago
- A fast RWKV Tokenizer written in Rust☆36Updated 2 months ago
- ☆26Updated 4 months ago
- Reinforcement Learning Toolkit for RWKV. Distillation,SFT,RLHF(DPO,ORPO), infinite context training, Aligning Let's boost the model's int…☆18Updated this week
- A repository for research on medium sized language models.☆74Updated 5 months ago
- Evaluating LLMs with Dynamic Data☆68Updated this week
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆91Updated last month
- DPO, but faster 🚀☆20Updated last week
- ☆33Updated 5 months ago
- A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.☆28Updated 4 months ago
- RWKV infctx trainer, for training arbitary context sizes, to 10k and beyond!☆133Updated 2 months ago
- ☆52Updated 5 months ago
- QuIP quantization☆46Updated 7 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆51Updated this week
- ☆62Updated last month
- The reproduct of the paper - Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction☆21Updated 5 months ago
- Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b Mo…☆25Updated 4 months ago
- ☆34Updated 8 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 8 months ago
- ☆44Updated 2 months ago
- Data preparation code for CrystalCoder 7B LLM☆42Updated 6 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- ☆17Updated 7 months ago
- My fork os allen AI's OLMo for educational purposes.☆28Updated 6 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆133Updated last month
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆116Updated 4 months ago
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆19Updated 5 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆129Updated last month
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆16Updated 2 months ago
- A pipeline parallel training script for LLMs.☆83Updated 3 weeks ago