michaelfeil / candle-flash-attn-v3Links
β12Updated 8 months ago
Alternatives and similar repositories for candle-flash-attn-v3
Users that are interested in candle-flash-attn-v3 are comparing it to the libraries listed below
Sorting:
- implement llava using candleβ15Updated last year
- π· Build compute kernelsβ163Updated last week
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedbackβ107Updated 7 months ago
- Rust crate for some audio utilitiesβ25Updated 7 months ago
- Simple high-throughput inference libraryβ147Updated 5 months ago
- Proof of concept for running moshi/hibiki using webrtcβ19Updated 7 months ago
- GPU based FFT written in Rust and CubeCLβ24Updated 4 months ago
- A high-performance constrained decoding engine based on context free grammar in Rustβ55Updated 5 months ago
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rustβ39Updated 2 years ago
- Cray-LM unified training and inference stack.β22Updated 8 months ago
- Rust Implementation of microgradβ53Updated last year
- β12Updated last year
- β21Updated 7 months ago
- A collection of optimisers for use with candleβ43Updated 2 months ago
- vLLM adapter for a TGIS-compatible gRPC server.β41Updated this week
- CLI utility to inspect and explore .safetensors and .gguf filesβ32Updated 2 months ago
- A small rust-based data loaderβ31Updated 4 months ago
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasβ¦β210Updated 3 weeks ago
- RWKV-7: Surpassing GPTβ98Updated 11 months ago
- Fast serverless LLM inference, in Rust.β94Updated 7 months ago
- Make triton easierβ48Updated last year
- β134Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β33Updated last month
- β24Updated 6 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inferenceβ61Updated last month
- Read and write tensorboard data using Rustβ23Updated last year
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IPβ133Updated last month
- Repository containing the SPIN experiments on the DIBT 10k ranked promptsβ24Updated last year
- A collection of reproducible inference engine benchmarksβ34Updated 6 months ago
- DPO, but faster πβ45Updated 10 months ago