danielgross / ggml-k8s
Run GGML models with Kubernetes.
☆174Updated last year
Alternatives and similar repositories for ggml-k8s:
Users that are interested in ggml-k8s are comparing it to the libraries listed below
- Simple embedding -> text model trained on a small subset of Wikipedia sentences.☆153Updated last year
- A curated list of amazingly awesome Modal applications, demos, and shiny things. Inspired by awesome-php.☆116Updated this week
- A collection of LLM services you can self host via docker or modal labs to support your applications development☆186Updated 9 months ago
- ☆136Updated last year
- run embeddings in MLX☆82Updated 4 months ago
- run paligemma in real time☆130Updated 9 months ago
- GPU accelerated client-side embeddings for vector search, RAG etc.☆65Updated last year
- A feed of trending repos/models from GitHub, Replicate, HuggingFace, and Reddit.☆120Updated 5 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated last year
- Fluid Database☆114Updated 5 months ago
- Command-line script for inferencing from models such as MPT-7B-Chat☆101Updated last year
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆222Updated 9 months ago
- Turing machines, Rule 110, and A::B reversal using Claude 3 Opus.☆59Updated 9 months ago
- AI sends pull requests for features you request in natural language☆113Updated last year
- On-device intelligence.☆237Updated 5 months ago
- An mlx project to train a base model on your whatsapp chats using (Q)Lora finetuning☆162Updated last year
- Some of the scripts I use for scribepod @ https://scribepod.substack.com/, an automated AI podcast☆172Updated last year
- An HTTP serving framework by Banana☆98Updated last year
- llm-consortium orchestrates mulitple LLMs, iteratively refines & achieves consensus.☆159Updated 2 weeks ago
- Demo of AI chatbot that predicts user message to generate response quickly.☆102Updated 11 months ago
- Chat Markup Language conversation library☆55Updated last year
- ☆111Updated 2 months ago
- Efficient vector database for hundred millions of embeddings.☆206Updated 9 months ago
- Mistral7B playing DOOM☆127Updated 7 months ago
- ☆194Updated 9 months ago
- Simple Transformer in Jax☆136Updated 7 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆121Updated 2 months ago
- Foyle is a copilot to help developers deploy and operate their applications.☆121Updated 2 weeks ago
- Command-line script for inferencing from models such as falcon-7b-instruct☆76Updated last year
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆63Updated 3 months ago