erfanzar / FJFormer

paralleled/unparalleled computational with FJFormer

☆24

Alternatives and similar repositories for FJFormer:

Users that are interested in FJFormer are comparing it to the libraries listed below

erfanzar / jax-flash-attn2
Flash Attention Implementation with Multiple Backend Support and Sharding This module provides a flexible implementation of Flash Attenti…
☆20Updated last month
erfanzar / EasyDeL
Accelerate, Optimize performance with streamlined training and serving options with JAX.
☆218Updated this week
Instinct-AI / Xerxes
Xerxes, a highly advanced Persian AI assistant developed by InstinctAI, a cutting-edge AI startup. primary function is to assist users wi…
☆11Updated 8 months ago
erfanzar / AgentX
AgentX is an Open-source library that help people use LLMs on their own computers or help them to serve LLMs as easy as possible that sup…
☆15Updated 7 months ago
erfanzar / InstinctiveDiffuse
A cutting-edge text-to-image generator model that leverages state-of-the-art Stable Diffusion Model Type to produce high-quality, realist…
☆14Updated 10 months ago
erfanzar / OST-OpenSourceTransformers
OST Collection: An AI-powered suite of models that predict the next word matches with remarkable accuracy (Text Generative Models). OST C…
☆15Updated last year
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆28Updated 2 months ago
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆44Updated 6 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆67Updated 2 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆121Updated 9 months ago
ClashLuke / tpucare
Automatically take good care of your preemptible TPUs
☆34Updated last year
HomebrewNLP / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆47Updated last year
yixiaoer / mistral-v0.2-jax
JAX implementation of the Mistral 7b v0.2 model
☆35Updated 6 months ago
cat-state / tinypar
☆20Updated last year
berlino / seq_icl
☆51Updated 8 months ago
davisyoshida / qax
If it quacks like a tensor...
☆55Updated 2 months ago
aniquetahir / JORA
JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)
☆32Updated 8 months ago
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆135Updated 10 months ago
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆81Updated last year
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆78Updated 2 years ago
cloneofsimo / min-fsdp
☆75Updated 6 months ago
dvruette / barrel-rec-pytorch
☆53Updated last year
catie-aq / flashT5
A fast implementation of T5/UL2 in PyTorch using Flash Attention
☆75Updated this week
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆52Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆46Updated last year
cloneofsimo / ezmup
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
☆73Updated 5 months ago
epfml / DenseFormer
☆78Updated 9 months ago
Sea-Snell / JAX_llama
Inference code for LLaMA models in JAX
☆115Updated 7 months ago
ayaka14732 / jax-smi
JAX Synergistic Memory Inspector
☆164Updated 6 months ago