michaelfeil / candle-flash-attn-v3
☆11Updated 3 months ago
Alternatives and similar repositories for candle-flash-attn-v3
Users that are interested in candle-flash-attn-v3 are comparing it to the libraries listed below
Sorting:
- GPU based FFT written in Rust and CubeCL☆22Updated 2 months ago
- implement llava using candle☆14Updated 11 months ago
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedback☆86Updated 2 months ago
- 👷 Build compute kernels☆38Updated this week
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆18Updated last year
- Proof of concept for running moshi/hibiki using webrtc☆18Updated 2 months ago
- Rust crate for some audio utilities☆23Updated 2 months ago
- ☆12Updated last year
- A high-performance constrained decoding engine based on context free grammar in Rust☆51Updated 4 months ago
- 🤝 Trade any tensors over the network☆30Updated last year
- Read and write tensorboard data using Rust☆21Updated last year
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rust☆38Updated last year
- Rust Implementation of micrograd☆51Updated 10 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆34Updated last week
- A collection of optimisers for use with candle☆35Updated last week
- Cray-LM unified training and inference stack.☆22Updated 3 months ago
- Manage ML configuration with pydantic☆16Updated 5 months ago
- A transformers like interface for interacting with local LLMs in Rust. This crate aims to provide the simplest interface to interact with…☆14Updated last week
- Load compute kernels from the Hub☆119Updated last week
- ☆29Updated 5 months ago
- ☆23Updated last month
- ☆39Updated 2 years ago
- Inference engine for GLiNER models, in Rust☆58Updated last month
- A small python library to run iterators in a separate process☆10Updated last year
- A small rust-based data loader☆24Updated 5 months ago
- ☆129Updated last year
- Your one stop CLI for ONNX model analysis.☆47Updated 2 years ago
- Fast serverless LLM inference, in Rust.☆70Updated 2 months ago
- Make triton easier☆47Updated 11 months ago
- 8-bit floating point types for Rust☆47Updated 2 months ago