Aleph-Alpha / trigrams

☆54

Alternatives and similar repositories for trigrams:

Users that are interested in trigrams are comparing it to the libraries listed below

SeunghyunSEO / optimized_hf_llama_class_for_training
☆47Updated 8 months ago
Knowledgator / FlashDeBERTa
Trully flash implementation of DeBERTa disentangled attention mechanism.
☆46Updated 3 weeks ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆55Updated 8 months ago
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆58Updated 3 months ago
RobertCsordas / moeut
☆78Updated 8 months ago
hamishivi / EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆72Updated 8 months ago
goncalorafaria / qalign
QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.
☆23Updated 3 weeks ago
arcee-ai / DAM
☆48Updated 5 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆76Updated 11 months ago
AnswerDotAI / ModernBERT-Instruct-mini-cookbook
☆43Updated 2 months ago
Zyphra / Zyda_processing
☆33Updated 10 months ago
nahidalam / maya
Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya
☆108Updated 2 months ago
kevinwu23 / StanfordFineTuneBench
☆28Updated 5 months ago
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆91Updated 2 weeks ago
tyler-romero / microR1
Simple repository for training small reasoning models
☆27Updated 2 months ago
Hannibal046 / nanoColBERT
Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).
☆80Updated last year
NathanGodey / qfilters
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆30Updated last month
joshuacnf / Ctrl-G
☆80Updated 3 months ago
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆111Updated 2 months ago
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆55Updated 2 weeks ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆37Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆43Updated 7 months ago
MinishLab / tokenlearn
Pre-train Static Word Embeddings
☆58Updated 3 weeks ago
EleutherAI / improved-t5
Experiments for efforts to train a new and improved t5
☆77Updated last year
jkallini / mrt5
Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."
☆40Updated 3 weeks ago
AnswerDotAI / fastkmeans
☆35Updated 2 weeks ago
huggingface / peft-pytorch-conference
Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…
☆14Updated last year
google-deepmind / asyncdiloco
☆43Updated last year
facebookresearch / mexma
MEXMA: Token-level objectives improve sentence representations
☆41Updated 4 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆97Updated 7 months ago