michaelfeil / candle-flash-attn-v3Links
β12Updated 6 months ago
Alternatives and similar repositories for candle-flash-attn-v3
Users that are interested in candle-flash-attn-v3 are comparing it to the libraries listed below
Sorting:
- π· Build compute kernelsβ87Updated this week
- implement llava using candleβ15Updated last year
- Proof of concept for running moshi/hibiki using webrtcβ20Updated 5 months ago
- GPU based FFT written in Rust and CubeCLβ23Updated last month
- Load compute kernels from the Hubβ220Updated this week
- β12Updated last year
- Simple high-throughput inference libraryβ125Updated 2 months ago
- CLI utility to inspect and explore .safetensors and .gguf filesβ24Updated last week
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasβ¦β192Updated 2 weeks ago
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedbackβ100Updated 4 months ago
- Rust crate for some audio utilitiesβ26Updated 4 months ago
- A high-performance constrained decoding engine based on context free grammar in Rustβ54Updated 2 months ago
- β21Updated 5 months ago
- vLLM adapter for a TGIS-compatible gRPC server.β33Updated this week
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rustβ38Updated 2 years ago
- Cray-LM unified training and inference stack.β22Updated 6 months ago
- β20Updated 10 months ago
- Fast serverless LLM inference, in Rust.β88Updated 5 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)β66Updated 4 months ago
- xet client tech, used in huggingface_hubβ148Updated this week
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β137Updated last year
- Inference Llama 2 with a model compiled to native code by TorchInductorβ14Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β33Updated 2 months ago
- Make triton easierβ47Updated last year
- A collection of optimisers for use with candleβ37Updated last week
- A collection of reproducible inference engine benchmarksβ32Updated 3 months ago
- SGLang is fast serving framework for large language models and vision language models.β24Updated this week
- Storing long contexts in tiny caches with self-studyβ121Updated this week
- TensorRT-LLM server with Structured Outputs (JSON) built with Rustβ57Updated 3 months ago
- Your one stop CLI for ONNX model analysis.β47Updated 2 years ago