michaelfeil / candle-flash-attn-v3Links
β13Updated 10 months ago
Alternatives and similar repositories for candle-flash-attn-v3
Users that are interested in candle-flash-attn-v3 are comparing it to the libraries listed below
Sorting:
- implement llava using candleβ15Updated last year
- π· Build compute kernelsβ190Updated this week
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedbackβ112Updated 8 months ago
- Rust crate for some audio utilitiesβ25Updated 8 months ago
- GPU based FFT written in Rust and CubeCLβ24Updated 5 months ago
- Proof of concept for running moshi/hibiki using webrtcβ19Updated 9 months ago
- A high-performance constrained decoding engine based on context free grammar in Rustβ56Updated 6 months ago
- vLLM adapter for a TGIS-compatible gRPC server.β45Updated this week
- Efficient non-uniform quantization with GPTQ for GGUFβ53Updated 2 months ago
- Simple high-throughput inference libraryβ150Updated 6 months ago
- Make triton easierβ49Updated last year
- β135Updated last year
- β12Updated last year
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rustβ39Updated 2 years ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)β66Updated 8 months ago
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasβ¦β217Updated 2 months ago
- β19Updated last year
- Automatically derive Python dunder methods for your Rust codeβ20Updated 7 months ago
- β21Updated 9 months ago
- Fast serverless LLM inference, in Rust.β108Updated last month
- Inference engine for GLiNER models, in Rustβ79Updated 2 weeks ago
- Inference of Mamba models in pure Cβ194Updated last year
- Inference Llama 2 with a model compiled to native code by TorchInductorβ14Updated last year
- Collection of autoregressive model implementationβ86Updated 7 months ago
- Rust Implementation of microgradβ53Updated last year
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top ofβ¦β146Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optunaβ59Updated last month
- A collection of optimisers for use with candleβ44Updated this week
- β111Updated 2 weeks ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IPβ140Updated 2 months ago