michaelfeil / candle-flash-attn-v3Links
β13Updated last week
Alternatives and similar repositories for candle-flash-attn-v3
Users that are interested in candle-flash-attn-v3 are comparing it to the libraries listed below
Sorting:
- implement llava using candleβ15Updated last year
- π· Build compute kernelsβ195Updated this week
- Simple high-throughput inference libraryβ153Updated 7 months ago
- GPU based FFT written in Rust and CubeCLβ25Updated 6 months ago
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedbackβ112Updated 9 months ago
- Rust crate for some audio utilitiesβ25Updated 9 months ago
- β12Updated last year
- β21Updated 9 months ago
- β19Updated last year
- Make triton easierβ49Updated last year
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rustβ39Updated 2 years ago
- Proof of concept for running moshi/hibiki using webrtcβ19Updated 10 months ago
- Inference engine for GLiNER models, in Rustβ81Updated last month
- A high-performance constrained decoding engine based on context free grammar in Rustβ56Updated 7 months ago
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasβ¦β223Updated 3 weeks ago
- Rust Implementation of microgradβ53Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.β46Updated this week
- β135Updated last year
- Minimalist vLLM implementation in Rustβ84Updated this week
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β32Updated 3 months ago
- Cray-LM unified training and inference stack.β22Updated 10 months ago
- A collection of optimisers for use with candleβ44Updated 3 weeks ago
- Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram andβ¦β39Updated 2 months ago
- β90Updated 5 months ago
- A collection of reproducible inference engine benchmarksβ38Updated 8 months ago
- Automatically derive Python dunder methods for your Rust codeβ20Updated 8 months ago
- NanoGPT-speedrunning for the poor T4 enjoyersβ73Updated 8 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!β64Updated last week
- β30Updated 8 months ago
- Efficient non-uniform quantization with GPTQ for GGUFβ57Updated 3 months ago