quic / efficient-transformers
This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.
☆65Updated this week
Alternatives and similar repositories for efficient-transformers
Users that are interested in efficient-transformers are comparing it to the libraries listed below
Sorting:
- ☆31Updated 10 months ago
- Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …☆60Updated 6 months ago
- A block oriented training approach for inference time optimization.☆33Updated 8 months ago
- Open Source Projects from Pallas Lab☆20Updated 3 years ago
- Model compression for ONNX☆92Updated 5 months ago
- Nsight Systems In Docker☆20Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆64Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆41Updated last year
- ☆146Updated 2 years ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆124Updated this week
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Updated 10 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆108Updated 8 months ago
- ☆29Updated last year
- ☆204Updated 3 years ago
- ☆68Updated 3 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆106Updated 6 months ago
- ☆26Updated last year
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated 4 months ago
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆37Updated this week
- Code for studying the super weight in LLM☆100Updated 5 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆194Updated this week
- Fast Hadamard transform in CUDA, with a PyTorch interface☆185Updated 11 months ago
- Arch-Net: Model Distillation for Architecture Agnostic Model Deployment☆22Updated 3 years ago
- llama INT4 cuda inference with AWQ☆54Updated 3 months ago
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning☆20Updated 2 years ago
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆49Updated last year
- ☆65Updated 6 months ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆91Updated 6 years ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 9 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆44Updated last week