ayaka14732 / llama-jax

JAX implementation of LLaMA, aiming to train LLaMA on Google Cloud TPU

☆14

Related projects ⓘ

Alternatives and complementary repositories for llama-jax

yixiaoer / mistral-v0.2-jax
JAX implementation of the Mistral 7b v0.2 model
☆33Updated 4 months ago
cat-state / tinypar
☆20Updated last year
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆40Updated 4 months ago
young-geng / mlxu
Machine Learning eXperiment Utilities
☆45Updated 5 months ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆36Updated last year
Sea-Snell / JAX_llama
Inference code for LLaMA models in JAX
☆113Updated 6 months ago
scottlogic-alex / prm800k-denorm
Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format
☆27Updated last year
mag- / gpu_benchmark
Gpu benchmark
☆43Updated last month
codekansas / rwkv
RWKV model implementation
☆38Updated last year
VITA-Group / WeLore
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…
☆43Updated 4 months ago
srush / LLM-Talk
☆47Updated 9 months ago
kernelmachine / cbtm
Code repository for the c-BTM paper
☆105Updated last year
yixiaoer / einshard
Einsum-like high-level array sharding API for JAX
☆32Updated 4 months ago
berlino / seq_icl
☆50Updated 6 months ago
Birch-san / booru-embed
[WIP] Transformer to embed Danbooru labelsets
☆13Updated 7 months ago
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆210Updated 9 months ago
prateeky2806 / ComPEFT
☆25Updated last year
EleutherAI / semantic-memorization
☆46Updated last week
yixiaoer / tpu-training-example
☆13Updated 4 months ago
ScalingIntelligence / large_language_monkeys
☆55Updated last month
luchris429 / DiscoPOP
Code for Discovering Preference Optimization Algorithms with and for Large Language Models
☆51Updated 5 months ago
geov-ai / geov
The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER).…
☆122Updated last year
google-deepmind / mishax
☆101Updated 3 months ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆95Updated 6 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆46Updated 2 months ago
srush / mamba-primer
☆35Updated 7 months ago
kyo-takano / chinchilla
A toolkit for scaling law research ⚖
☆43Updated 8 months ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆50Updated 7 months ago
r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆41Updated 10 months ago
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆61Updated 7 months ago