michaelfeil / candle-flash-attn-v3Links
β13Updated 3 weeks ago
Alternatives and similar repositories for candle-flash-attn-v3
Users that are interested in candle-flash-attn-v3 are comparing it to the libraries listed below
Sorting:
- implement llava using candleβ15Updated last year
- π· Build compute kernelsβ213Updated this week
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedbackβ114Updated 10 months ago
- Rust crate for some audio utilitiesβ26Updated 10 months ago
- vLLM adapter for a TGIS-compatible gRPC server.β47Updated this week
- A high-performance constrained decoding engine based on context free grammar in Rustβ58Updated 7 months ago
- GPU based FFT written in Rust and CubeCLβ28Updated 3 weeks ago
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasβ¦β225Updated last week
- Simple high-throughput inference libraryβ155Updated 8 months ago
- Minimalist vLLM implementation in Rustβ93Updated this week
- A collection of optimisers for use with candleβ45Updated 3 weeks ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β32Updated 4 months ago
- Automatically derive Python dunder methods for your Rust codeβ20Updated 9 months ago
- Proof of concept for running moshi/hibiki using webrtcβ19Updated 10 months ago
- Cray-LM unified training and inference stack.β22Updated 11 months ago
- Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tablesβ21Updated 8 months ago
- π€ Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtimeβ112Updated 3 weeks ago
- β12Updated 2 years ago
- Fast serverless LLM inference, in Rust.β108Updated 2 months ago
- Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram andβ¦β40Updated 3 months ago
- β21Updated 10 months ago
- β12Updated last year
- Your one stop CLI for ONNX model analysis.β47Updated 3 years ago
- β19Updated 2 weeks ago
- TensorRT-LLM server with Structured Outputs (JSON) built with Rustβ65Updated 8 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked promptsβ23Updated last year
- Make triton easierβ50Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)β66Updated 9 months ago
- Rust Implementation of microgradβ53Updated last year
- Inference engine for GLiNER models, in Rustβ81Updated last week