michaelfeil / candle-flash-attn-v3
☆11Updated 2 months ago
Alternatives and similar repositories for candle-flash-attn-v3:
Users that are interested in candle-flash-attn-v3 are comparing it to the libraries listed below
- implement llava using candle☆14Updated 10 months ago
- GPU based FFT written in Rust and CubeCL☆21Updated last month
- A collection of optimisers for use with candle☆34Updated 5 months ago
- 👷 Build compute kernels☆35Updated this week
- Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and…☆19Updated last month
- ☆12Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆26Updated this week
- Rust crate for some audio utilities☆22Updated last month
- ☆39Updated 2 years ago
- Read and write tensorboard data using Rust☆20Updated last year
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆18Updated last year
- Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and visi…☆23Updated 2 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- ☆28Updated 5 months ago
- Proof of concept for running moshi/hibiki using webrtc☆18Updated last month
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rust☆38Updated last year
- A small python library to run iterators in a separate process☆10Updated last year
- A high-performance constrained decoding engine based on context free grammar in Rust☆50Updated 3 months ago
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedback☆80Updated last month
- Cray-LM unified training and inference stack.☆22Updated 2 months ago
- Modular Rust transformer/LLM library using Candle☆36Updated 11 months ago
- Make triton easier☆47Updated 10 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆34Updated 4 months ago
- NanoGPT (124M) quality in 2.67B tokens☆28Updated this week
- Rust Implementation of micrograd☆51Updated 9 months ago
- Inference engine for GLiNER models, in Rust☆45Updated 3 weeks ago
- JAX bindings for the flash-attention3 kernels☆11Updated 8 months ago
- Train, tune, and infer Bamba model☆88Updated this week
- ☆13Updated last year
- PostText is a QA system for querying your text data. When appropriate structured views are in place, PostText is good at answering querie…☆32Updated last year