huggingface / optimum-furiosaLinks

Accelerated inference of 🤗 models using FuriosaAI NPU chips.

☆26

Alternatives and similar repositories for optimum-furiosa

Users that are interested in optimum-furiosa are comparing it to the libraries listed below

Sorting:

huggingface / api-inference-community
☆170Updated 8 months ago
huggingface / optimum-graphcore
Blazing fast training of 🤗 Transformers on Graphcore IPUs
☆85Updated last year
huggingface / fuego
[WIP] A 🔥 interface for running code in the cloud
☆85Updated 2 years ago
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆121Updated 9 months ago
huggingface / zapier
Hugging Face's Zapier Integration 🤗⚡️
☆48Updated 2 years ago
huggingface / hffs
**ARCHIVED** Filesystem interface to 🤗 Hub
☆58Updated 2 years ago
GreenBitAI / low_bit_llama
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
☆110Updated last year
nebuly-ai / learning-hub
☆29Updated 2 years ago
AI-Hypercomputer / maxdiffusion
☆269Updated last week
OFA-Sys / diffusion-deploy
☆51Updated 2 years ago
abetlen / ggml-python
Python bindings for ggml
☆146Updated last year
huggingface / frp
FRP Fork
☆175Updated 6 months ago
TheBlokeAI / AIScripts
Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub
☆160Updated 2 years ago
huggingface / discord-bots
☆50Updated 2 years ago
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆318Updated last month
huggingface / optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
☆199Updated last week
huggingface / hf-endpoints-documentation
☆20Updated last week
jllllll / exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆63Updated 2 years ago
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆45Updated last year
hamelsmu / llama-inference
experiments with inference on llama
☆103Updated last year
AlpinDale / sparsegpt-for-LLaMA
Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.
☆70Updated 2 years ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated last year
huggingface / bloom-jax-inference
☆66Updated 3 years ago
cchan / nanoGPT-fp8
☆13Updated 2 years ago
rasbt / pytorch-memory-optim
This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…
☆91Updated 2 years ago
dropbox / aana_sdk
Aana SDK is a powerful framework for building AI enabled multimodal applications.
☆53Updated 2 months ago
qwopqwop200 / gptqlora
GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ
☆102Updated 2 years ago
huggingface / data-measurements-tool
Developing tools to automatically analyze datasets
☆75Updated last year
CarperAI / treasure_trove
☆22Updated 2 years ago