distantmagic / paddler
Stateful load balancer custom-tailored for llama.cpp
☆518Updated this week
Related projects: ⓘ
- Visualize the intermediate output of Mistral 7B☆300Updated 7 months ago
- LLM Analytics☆593Updated last month
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆459Updated this week
- Finetune llama2-70b and codellama on MacBook Air without quantization☆443Updated 5 months ago
- Replace OpenAI with Llama.cpp Automagically.☆276Updated 3 months ago
- Action library for AI Agent☆187Updated this week
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆467Updated last month
- Like grep but for natural language questions. Based on Mistral 7B or Mixtral 8x7B.☆373Updated 6 months ago
- Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit☆677Updated last month
- Multi-node production AI stack. Run the best of open source AI easily on your own servers. Create your own AI by fine-tuning open source …☆319Updated this week
- This project collects GPU benchmarks from various cloud providers and compares them to fixed per token costs. Use our tool for efficient …☆196Updated 2 weeks ago
- An implementation of bucketMul LLM inference☆212Updated 2 months ago
- Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.☆260Updated last month
- WebAssembly binding for llama.cpp - Enabling in-browser LLM inference☆342Updated last week
- Things you can do with the token embeddings of an LLM☆730Updated this week
- Chat with any codebase with 2 commands☆546Updated this week
- Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 times faster with just a few l…☆269Updated last month
- Mistral7B playing DOOM☆117Updated 2 months ago
- A framework for building, experimenting, deploying, and continuously iterating on your LLM application☆290Updated this week
- An extremely fast implementation of whisper optimized for Apple Silicon using MLX.☆519Updated 4 months ago
- Optimizing inference proxy for LLMs☆406Updated this week
- ☆719Updated 5 months ago
- Fast, SQL powered, in-process vector search for any language with an SQLite driver☆225Updated this week
- Build and query dynamic, temporally-aware Knowledge Graphs☆572Updated this week
- A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for vario…☆920Updated this week
- Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.☆821Updated 8 months ago
- GGUF implementation in C as a library and a tools CLI program☆239Updated 2 months ago
- ☆218Updated last month
- Simple Python library/structure to ablate features in LLMs which are supported by TransformerLens☆287Updated 3 months ago
- Felafax is building AI infra for non-NVIDIA GPUs☆282Updated this week