octoml / Apple-M1-BERTLinks

3X speedup over Apple’s TensorFlow plugin by using Apache TVM on M1

☆136

Alternatives and similar repositories for Apple-M1-BERT

Users that are interested in Apple-M1-BERT are comparing it to the libraries listed below

Sorting:

nod-ai / SRT
Nod.ai 🦈 version of 👻 . You probably want to start at https://github.com/nod-ai/shark for the product and the upstream IREE repository …
☆106Updated 7 months ago
nod-ai / transformer-benchmarks
benchmarking some transformer deployments
☆26Updated 2 years ago
sdpython / onnxcustom
Tutorial on how to convert machine learned models into ONNX
☆16Updated 2 years ago
graphcore / examples
Example code and applications for machine learning on Graphcore IPUs
☆325Updated last year
huggingface / tune
☆87Updated 3 years ago
pytorch / multipy
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…
☆180Updated 3 weeks ago
graphcore / tutorials
Training material for IPU users: tutorials, feature examples, simple applications
☆86Updated 2 years ago
pytorch / torchdistx
Torch Distributed Experimental
☆117Updated last year
yandex-research / DeDLOC
Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)
☆117Updated 3 years ago
marsupialtail / sparsednn
Fast sparse deep learning on CPUs
☆54Updated 2 years ago
pytorch / ort
Accelerate PyTorch models with ONNX Runtime
☆364Updated 5 months ago
EleutherAI / DeeperSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
☆168Updated 2 weeks ago
facebookresearch / diffq
DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight …
☆236Updated 2 years ago
DeMoriarty / custom_matmul_kernels
Customized matrix multiplication kernels
☆56Updated 3 years ago
sdpython / mlprodict
Productionize machine learning predictions, with ONNX or without
☆65Updated last year
facebookresearch / FBTT-Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as …
☆194Updated 3 years ago
graphcore / poptorch
PyTorch interface for the IPU
☆180Updated last year
explosion / thinc-apple-ops
🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library
☆99Updated last month
RobertRiachi / ANE-Optimized-Whisper-OpenAI
☆55Updated 2 years ago
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆273Updated 3 years ago
huggingface / optimum-graphcore
Blazing fast training of 🤗 Transformers on Graphcore IPUs
☆85Updated last year
kingoflolz / swarm-jax
Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes
☆241Updated 2 years ago
apple / ml-quant
Research publication code for "Least Squares Binary Quantization of Neural Networks"
☆83Updated 2 years ago
pytorch-tpu / examples
This repository contains example code to build models on TPUs
☆30Updated 2 years ago
lucidrains / PaLM-jax
Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)
☆187Updated 3 years ago
shawwn / tpunicorn
Babysit your preemptible TPUs
☆86Updated 2 years ago
EleutherAI / openwebtext2
☆90Updated 3 years ago
tlkh / m1-cpu-benchmarks
☆52Updated 3 years ago
HomebrewML / HomebrewNLP-torch
A case study of efficient training of large language models using commodity hardware.
☆68Updated 3 years ago
abetlen / ggml-python
Python bindings for ggml
☆142Updated 11 months ago