tairov / lamatune
LLama implementations benchmarking framework
☆12Updated last year
Alternatives and similar repositories for lamatune:
Users that are interested in lamatune are comparing it to the libraries listed below
- Proof of concept for a generative AI application framework powered by WebAssembly and Extism☆14Updated last year
- iterate quickly with llama.cpp hot reloading. use the llama.cpp bindings with bun.sh☆47Updated last year
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆24Updated 3 months ago
- Run Llama 2 using MLX on macOS☆33Updated last year
- ☆25Updated 2 months ago
- GPU accelerated client-side embeddings for vector search, RAG etc.☆65Updated last year
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆70Updated 2 months ago
- Nexusflow function call, tool use, and agent benchmarks.☆19Updated 2 months ago
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆18Updated 10 months ago
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rust☆37Updated last year
- ☆31Updated last year
- A python command-line tool to download & manage MLX AI models from Hugging Face.☆17Updated 5 months ago
- ☆15Updated 11 months ago
- alternative way to calculating self attention☆18Updated 8 months ago
- Radiantloom Email Assist 7B is an email-assistant large language model fine-tuned from Zephyr-7B-Beta, over a custom-curated dataset of 1…☆14Updated last year
- ☆15Updated last year
- A clone of OpenAI's Tokenizer page for HuggingFace Models☆44Updated last year
- Training hybrid models for dummies.☆20Updated last month
- ☆22Updated 4 months ago
- LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…☆28Updated 3 weeks ago
- Rust Implementation of micrograd☆51Updated 7 months ago
- ANE accelerated embedding models!☆15Updated 2 months ago
- ☆10Updated last year
- Run embedding models using ONNX☆30Updated last year
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆28Updated 3 weeks ago
- 360M model running in the browser on WebGPU☆21Updated 5 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆53Updated 11 months ago
- Rust bindings for CTranslate2☆14Updated last year
- FalkorDB-Browser is a visualization UI for FalkorDB.☆26Updated this week
- Light WebUI for lm.rs☆23Updated 4 months ago