OpenNMT / CTranslate2Links

Fast inference engine for Transformer models

☆4,056

Alternatives and similar repositories for CTranslate2

Users that are interested in CTranslate2 are comparing it to the libraries listed below

Sorting:

huggingface / distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
☆3,958Updated 9 months ago
turboderp-org / exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
☆4,341Updated 2 months ago
huggingface / optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization…
☆3,115Updated last week
marella / ctransformers
Python bindings for the Transformer models implemented in C/C++ using GGML library.
☆1,876Updated last year
huggingface / text-generation-inference
Large Language Model Text Generation Inference
☆10,566Updated last month
turboderp / exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆2,903Updated 2 years ago
casper-hansen / AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,256Updated 5 months ago
AutoGPTQ / AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
☆4,965Updated 6 months ago
intel / intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…
☆2,163Updated last year
deepspeedai / DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
☆2,067Updated 3 months ago
ufal / whisper_streaming
Whisper realtime streaming for long speech-to-text transcription and translation
☆3,388Updated last month
bitsandbytes-foundation / bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
☆7,647Updated 2 weeks ago
snakers4 / silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
☆7,066Updated last week
sanchit-gandhi / whisper-jax
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
☆4,635Updated last year
RWKV / rwkv.cpp
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
☆1,547Updated 6 months ago
mit-han-lab / llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,305Updated 3 months ago
SYSTRAN / faster-whisper
Faster Whisper transcription with CTranslate2
☆18,512Updated last week
pyannote / pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker…
☆8,505Updated last week
aphrodite-engine / aphrodite-engine
Large-scale LLM inference engine
☆1,567Updated last week
IST-DASLab / gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
☆2,196Updated last year
Softcatala / whisper-ctranslate2
Whisper command line client compatible with original OpenAI client based on CTranslate2.
☆1,121Updated 2 months ago
collabora / WhisperLive
A nearly-live implementation of OpenAI's Whisper.
☆3,488Updated 3 weeks ago
m-bain / whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
☆18,157Updated this week
qwopqwop200 / GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ
☆3,073Updated last year
ModelTC / LightLLM
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…
☆3,650Updated this week
linto-ai / whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
☆2,627Updated last month
Helsinki-NLP / Opus-MT
Open neural machine translation models and web services
☆735Updated 4 months ago
S-LoRA / S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
☆1,856Updated last year
michaelfeil / infinity
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
☆2,500Updated last week
arcee-ai / mergekit
Tools for merging pretrained large language models.
☆6,378Updated last month