ayaka14732 / bart-base-jaxLinks

JAX implementation of the bart-base model

☆34

Alternatives and similar repositories for bart-base-jax

Users that are interested in bart-base-jax are comparing it to the libraries listed below

Sorting:

ayaka14732 / TrAVis
TrAVis: Visualise BERT attention in your browser
☆58Updated 2 years ago
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Updated 2 years ago
bigscience-workshop / multilingual-modeling
BLOOM+1: Adapting BLOOM model to support a new unseen language
☆74Updated last year
chandar-lab / NeoBERT
☆92Updated 5 months ago
huggingface / olm-training
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆96Updated 2 years ago
LegallyCoder / mamba-hf
Implementation of the Mamba SSM with hf_integration.
☆56Updated last year
facebookresearch / lss_eval
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Updated 2 years ago
google-research-datasets / QAmeleon
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆35Updated 2 years ago
cimeister / typical-sampling
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
☆81Updated 3 years ago
NathanGodey / headless-lm
Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…
☆28Updated last year
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆57Updated last week
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆87Updated 3 years ago
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated last year
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Updated last year
deep-spin / infinite-former
☆67Updated last year
srush / LLM-Talk
☆52Updated last year
lucidrains / memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
☆35Updated 2 years ago
AlexWan0 / infini-gram
An unofficial implementation of the Infini-gram model proposed by Liu et al. (2024)
☆33Updated last year
Rallio67 / language-model-agents
Experiments with generating opensource language model assistants
☆97Updated 2 years ago
Knowledgator / TurboT5
Truly flash T5 realization!
☆71Updated last year
ltgoslo / gpt-bert
Official implementation of "GPT or BERT: why not both?"
☆62Updated 4 months ago
warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆116Updated 2 months ago
Lightning-Universe / lightning-ColossalAI
Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI
☆56Updated 2 years ago
gsarti / t5-flax-gcp
Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP
☆58Updated 3 years ago
ZurichNLP / mbr
Minimum Bayes Risk Decoding for Hugging Face Transformers
☆60Updated last year
catie-aq / flashT5
A fast implementation of T5/UL2 in PyTorch using Flash Attention
☆112Updated last month
zsc / llama_infer
Inference script for Meta's LLaMA models using Hugging Face wrapper
☆110Updated 2 years ago
Zyphra / Zyda_processing
☆39Updated last year
babylm / evaluation-pipeline-2023
Evaluation pipeline for the BabyLM Challenge 2023.
☆77Updated 2 years ago
HomebrewML / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆51Updated last year