huggingface / optimum-tpuLinks

Google TPU optimizations for transformers models

☆117

Alternatives and similar repositories for optimum-tpu

Users that are interested in optimum-tpu are comparing it to the libraries listed below

Sorting:

huggingface / kernels
Load compute kernels from the Hub
☆220Updated this week
QuixiAI / grokadamw
☆134Updated 11 months ago
QuixiAI / spectrum
☆128Updated 3 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 3 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆198Updated last year
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆81Updated 2 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆103Updated 4 months ago
huggingface / llm-swarm
Manage scalable open LLM inference endpoints in Slurm clusters
☆268Updated last year
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆265Updated 9 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆127Updated 8 months ago
Pleias / Various-Finetuning
Set of scripts to finetune LLMs
☆37Updated last year
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 9 months ago
huggingface / kernel-builder
👷 Build compute kernels
☆87Updated this week
google-deepmind / asyncdiloco
☆45Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆94Updated 8 months ago
Aleph-Alpha-Research / scaling
Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…
☆64Updated 9 months ago
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆277Updated last week
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆103Updated 3 months ago
AI-Hypercomputer / maxdiffusion
☆238Updated this week
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆244Updated 6 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆184Updated 6 months ago
ServiceNow / Fast-LLM
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
☆217Updated this week
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆121Updated this week
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆61Updated 9 months ago
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆78Updated last week
warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆103Updated this week
AI-Hypercomputer / jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
☆66Updated 4 months ago
jxmorris12 / cde
code for training & evaluating Contextual Document Embedding models
☆196Updated 2 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆55Updated 6 months ago
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆86Updated last year