michaelfeil / candle-flash-attn-v3Links
β12Updated 7 months ago
Alternatives and similar repositories for candle-flash-attn-v3
Users that are interested in candle-flash-attn-v3 are comparing it to the libraries listed below
Sorting:
- implement llava using candleβ15Updated last year
- π· Build compute kernelsβ136Updated this week
- Simple high-throughput inference libraryβ127Updated 4 months ago
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedbackβ103Updated 6 months ago
- Proof of concept for running moshi/hibiki using webrtcβ20Updated 6 months ago
- GPU based FFT written in Rust and CubeCLβ23Updated 3 months ago
- A high-performance constrained decoding engine based on context free grammar in Rustβ56Updated 3 months ago
- vLLM adapter for a TGIS-compatible gRPC server.β39Updated this week
- β21Updated 6 months ago
- Read and write tensorboard data using Rustβ23Updated last year
- β132Updated last year
- β12Updated last year
- Cray-LM unified training and inference stack.β22Updated 7 months ago
- CLI utility to inspect and explore .safetensors and .gguf filesβ28Updated last month
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rustβ39Updated 2 years ago
- Make triton easierβ47Updated last year
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasβ¦β201Updated last month
- Fast serverless LLM inference, in Rust.β91Updated 6 months ago
- β20Updated 11 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)β66Updated 5 months ago
- Load compute kernels from the Hubβ271Updated this week
- A collection of optimisers for use with candleβ40Updated last month
- Inference engine for GLiNER models, in Rustβ66Updated 2 months ago
- Rust Implementation of microgradβ52Updated last year
- Inference of Mamba models in pure Cβ191Updated last year
- A collection of reproducible inference engine benchmarksβ32Updated 4 months ago
- β33Updated 9 months ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IPβ120Updated this week
- Experimental compiler for deep learning modelsβ66Updated 3 months ago
- Collection of autoregressive model implementationβ86Updated 4 months ago