ayaka14732 / llama-jax
JAX implementation of LLaMA, aiming to train LLaMA on Google Cloud TPU
☆14Updated last year
Related projects ⓘ
Alternatives and complementary repositories for llama-jax
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆25Updated last year
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆37Updated 5 months ago
- ☆20Updated last year
- JAX implementation of the Llama 2 model☆210Updated 9 months ago
- ☆76Updated 6 months ago
- A set of Python scripts that makes your experience on TPU better☆40Updated 4 months ago
- ☆53Updated 9 months ago
- ☆55Updated 11 months ago
- Collection of autoregressive model implementation☆66Updated last week
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated 11 months ago
- JAX implementation of the Mistral 7b v0.2 model☆33Updated 4 months ago
- Inference code for LLaMA models in JAX☆112Updated 5 months ago
- Machine Learning eXperiment Utilities☆45Updated 5 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- Code repository for the c-BTM paper☆105Updated last year
- ☆50Updated 5 months ago
- ☆63Updated 4 months ago
- [ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluating☆31Updated this week
- ☆39Updated 9 months ago
- Evaluating LLMs with CommonGen-Lite☆84Updated 7 months ago
- ☆24Updated 2 months ago
- Understand and test language model architectures on synthetic tasks.☆161Updated 6 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆61Updated 7 months ago
- ☆14Updated 7 months ago
- ☆40Updated last week
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆14Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆49Updated 7 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆13Updated 2 weeks ago
- ☆100Updated 3 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆30Updated 2 months ago