BBuf / RWKV-World-HF-TokenizerLinks
☆34Updated 11 months ago
Alternatives and similar repositories for RWKV-World-HF-Tokenizer
Users that are interested in RWKV-World-HF-Tokenizer are comparing it to the libraries listed below
Sorting:
- A fast RWKV Tokenizer written in Rust☆46Updated 2 months ago
- RWKV-7: Surpassing GPT☆91Updated 7 months ago
- A repository for research on medium sized language models.☆76Updated last year
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated last month
- ☆18Updated 5 months ago
- A large-scale RWKV v6, v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to de…☆38Updated 3 weeks ago
- ☆36Updated last month
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated last year
- DPO, but faster 🚀☆43Updated 6 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆98Updated 8 months ago
- My fork os allen AI's OLMo for educational purposes.☆30Updated 6 months ago
- ☆35Updated last year
- Reinforcement Learning Toolkit for RWKV.(v6,v7,ARWKV) Distillation,SFT,RLHF(DPO,ORPO), infinite context training, Aligning. Exploring the…☆44Updated last month
- GoldFinch and other hybrid transformer components☆45Updated 11 months ago
- Here we will test various linear attention designs.☆59Updated last year
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆55Updated 3 weeks ago
- Evaluating LLMs with Dynamic Data☆93Updated last month
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆32Updated 10 months ago
- ☆17Updated last year
- Linear Attention Sequence Parallelism (LASP)☆84Updated last year
- This project is established for real-time training of the RWKV model.☆49Updated last year
- Direct Preference Optimization for RWKV, aiming for RWKV-5 and 6.☆11Updated last year
- The official code repo and data hub of top_nsigma sampling strategy for LLMs.☆26Updated 4 months ago
- A specialized RWKV-7 model for Othello(a.k.a. Reversi) that predicts legal moves, evaluates positions, and performs in-context search. It…☆41Updated 5 months ago
- Implementation of the Mamba SSM with hf_integration.☆56Updated 9 months ago
- RWKV, in easy to read code☆72Updated 2 months ago
- The reproduct of the paper - Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction☆22Updated last year
- RWKV-LM-V7(https://github.com/BlinkDL/RWKV-LM) Under Lightning Framework☆28Updated last week
- ☆129Updated last week
- ☆56Updated 3 months ago