devvrit / matformerLinks
MatFormer repo
☆31Updated 6 months ago
Alternatives and similar repositories for matformer
Users that are interested in matformer are comparing it to the libraries listed below
Sorting:
- ☆51Updated 7 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last year
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆78Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- ☆60Updated 3 weeks ago
- ☆56Updated last month
- This repo is based on https://github.com/jiaweizzhao/GaLore☆28Updated 9 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated last month
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆44Updated 9 months ago
- A repository for research on medium sized language models.☆76Updated last year
- ☆47Updated 9 months ago
- Lego for GRPO☆28Updated 3 weeks ago
- ☆25Updated last year
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆45Updated 2 months ago
- Collection of autoregressive model implementation☆85Updated 2 months ago
- ☆35Updated last year
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆29Updated 2 months ago
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆14Updated last year
- ☆47Updated 4 months ago
- ☆68Updated 11 months ago
- ☆32Updated 5 months ago
- Using FlexAttention to compute attention with different masking patterns☆44Updated 9 months ago
- ☆49Updated last year
- Train, tune, and infer Bamba model☆127Updated 2 weeks ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆69Updated last week
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆59Updated 8 months ago
- ☆79Updated 10 months ago
- ☆79Updated 7 months ago
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆69Updated this week