danielgross / ggml-k8s
Run GGML models with Kubernetes.
☆173Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for ggml-k8s
- ☆136Updated 11 months ago
- An mlx project to train a base model on your whatsapp chats using (Q)Lora finetuning☆159Updated 10 months ago
- run paligemma in real time☆122Updated 6 months ago
- On-device intelligence.☆192Updated 2 months ago
- A collection of LLM services you can self host via docker or modal labs to support your applications development☆181Updated 6 months ago
- ☆104Updated 8 months ago
- A curated list of amazingly awesome Modal applications, demos, and shiny things. Inspired by awesome-php.☆86Updated last week
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆221Updated 6 months ago
- Foyle is a copilot to help developers deploy and operate their applications.☆108Updated this week
- ☆84Updated last month
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated 10 months ago
- Efficient vector database for hundred millions of embeddings.☆200Updated 6 months ago
- Simple embedding -> text model trained on a small subset of Wikipedia sentences.☆152Updated last year
- ☆182Updated 6 months ago
- Solving data for LLMs - Create quality synthetic datasets!☆137Updated last month
- Mistral7B playing DOOM☆122Updated 4 months ago
- A feed of trending repos/models from GitHub, Replicate, HuggingFace, and Reddit.☆108Updated 2 months ago
- Simple Transformer in Jax☆119Updated 4 months ago
- Routing on Random Forest (RoRF)☆84Updated last month
- Full finetuning of large language models without large memory requirements☆93Updated 10 months ago
- Cerule - A Tiny Mighty Vision Model☆67Updated 2 months ago
- GRDN.AI app for garden optimization☆69Updated 9 months ago
- run embeddings in MLX☆73Updated last month
- LLaVA server (llama.cpp).☆177Updated last year
- Logging and caching superpowers for the openai sdk☆100Updated 8 months ago
- Command-line script for inferencing from models such as MPT-7B-Chat☆102Updated last year
- Fast parallel LLM inference for MLX☆149Updated 4 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆55Updated 2 weeks ago
- The Moshi speech-to-speech model, deployed to Modal with a realtime CLI chat☆56Updated last month
- Video+code lecture on building nanoGPT from scratch☆64Updated 5 months ago