OpenNMT / CTranslate2Links
Fast inference engine for Transformer models
β4,154Updated last week
Alternatives and similar repositories for CTranslate2
Users that are interested in CTranslate2 are comparing it to the libraries listed below
Sorting:
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.β3,987Updated 10 months ago
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,188Updated 2 weeks ago
- Python bindings for the Transformer models implemented in C/C++ using GGML library.β1,877Updated last year
- Silero VAD: pre-trained enterprise-grade Voice Activity Detectorβ7,450Updated last week
- A fast inference library for running LLMs locally on modern consumer-class GPUsβ4,372Updated 3 months ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.β2,905Updated 2 years ago
- β‘ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plβ¦β2,167Updated last year
- Whisper realtime streaming for long speech-to-text transcription and translationβ3,466Updated 2 weeks ago
- Accessible large language models via k-bit quantization for PyTorch.β7,790Updated this week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.β4,992Updated 7 months ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:β2,276Updated 6 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,079Updated 5 months ago
- Tensor library for machine learningβ13,617Updated last week
- A nearly-live implementation of OpenAI's Whisper.β3,632Updated 2 months ago
- JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.β4,645Updated last year
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".β2,221Updated last year
- Large Language Model Text Generation Inferenceβ10,664Updated last week
- Simple, safe way to store and distribute tensorsβ3,528Updated last week
- Python bindings for llama.cppβ9,786Updated 3 months ago
- Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speakerβ¦β8,761Updated this week
- Transformer related optimization, including BERT, GPTβ6,355Updated last year
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Accelerationβ3,362Updated 4 months ago
- Open neural machine translation models and web servicesβ747Updated last week
- Large-scale LLM inference engineβ1,600Updated last week
- 4 bits quantization of LLaMA using GPTQβ3,079Updated last year
- Multilingual Automatic Speech Recognition with word-level timestamps and confidenceβ2,681Updated 2 months ago
- Unified-Modal Speech-Text Pre-Training for Spoken Language Processingβ1,414Updated last year
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language modelβ1,554Updated 8 months ago
- Tools for merging pretrained large language models.β6,494Updated this week
- Whisper command line client compatible with original OpenAI client based on CTranslate2.β1,148Updated last week