pabloiyu / mini-language-model
Implementing Mamba SSM into a mini language model and training it on the open domain works of Sherlock Holmes. Also, implementation of parallel adapters into a transformer. Finally, code to run a quantized version of Mistral-7B.
☆9Updated 10 months ago
Alternatives and similar repositories for mini-language-model:
Users that are interested in mini-language-model are comparing it to the libraries listed below
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆27Updated this week
- Nexusflow function call, tool use, and agent benchmarks.☆19Updated last month
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆50Updated 9 months ago
- Here we will test various linear attention designs.☆58Updated 9 months ago
- Lottery Ticket Adaptation☆37Updated 2 months ago
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆33Updated 2 months ago
- ☆44Updated 6 months ago
- A repository for research on medium sized language models.☆76Updated 8 months ago
- RWKV6 in native pytorch and triton:)☆11Updated 5 months ago
- The reproduct of the paper - Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction☆22Updated 8 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆38Updated 11 months ago
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆17Updated 3 weeks ago
- Course Project for COMP4471 on RWKV☆17Updated 11 months ago
- Training hybrid models for dummies.☆18Updated 2 weeks ago
- Structural Pruning for LLaMA☆54Updated last year
- Trying to deconstruct RWKV in understandable terms☆14Updated last year
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 10 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆36Updated last year
- ☆32Updated last year
- Run ONNX RWKV-v4 models with GPU acceleration using DirectML [Windows], or just on CPU [Windows AND Linux]; Limited to 430M model at this…☆20Updated last year
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆15Updated 2 months ago
- CursorCore: Assist Programming through Aligning Anything☆80Updated 3 months ago
- Here we collect trick questions and failed tasks for open source LLMs to improve them.☆32Updated last year
- Efficient and Scalable Estimation of Tool Representations in Vector Space☆18Updated 4 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆96Updated 4 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆42Updated 6 months ago
- RWKV-7: Surpassing GPT☆73Updated 2 months ago
- A large-scale RWKV v6, v7 inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy on docker. Supports tr…☆25Updated last week
- ☆27Updated 5 months ago