pacman100 / mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
☆11Updated last year
Related projects ⓘ
Alternatives and complementary repositories for mlc-llm
- HTTP proxy for on-demand model loading with llama.cpp (or other OpenAI compatible backends)☆41Updated this week
- Extend the original llama.cpp repo to support redpajama model.☆117Updated 2 months ago
- LlamaTor: Decentralized AI model sharing via BitTorrent for efficient, user-friendly distribution and collaboration.☆40Updated 5 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆196Updated 6 months ago
- Simple, hackable and fast implementation for training/finetuning medium-sized LLaMA-based models☆153Updated this week
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆47Updated this week
- ☆53Updated 5 months ago
- GRadient-INformed MoE☆259Updated last month
- A benchmark for emotional intelligence in large language models☆197Updated 3 months ago
- A pipeline parallel training script for LLMs.☆83Updated this week
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆142Updated 9 months ago
- An unsupervised model merging algorithm for Transformers-based language models.☆100Updated 6 months ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆40Updated last month
- ☆128Updated this week
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆317Updated last month
- idea: https://github.com/nyxkrage/ebook-groupchat/☆82Updated 3 months ago
- run paligemma in real time☆123Updated 6 months ago
- ☆39Updated 9 months ago
- Low-Rank adapter extraction for fine-tuned transformers model☆162Updated 6 months ago
- Self-hosted LLM chatbot arena, with yourself as the only judge☆36Updated 9 months ago
- Something similar to Apple Intelligence?☆57Updated 4 months ago
- A python application that routes incoming prompts to an LLM by category, and can support a single incoming connection from a front end to…☆171Updated this week
- The training notebooks that were similar to the original script used to train TinyMistral.☆19Updated 11 months ago
- Use safetensors with ONNX 🤗☆26Updated 2 months ago
- Official repo for "Make Your LLM Fully Utilize the Context"☆243Updated 6 months ago
- Experimental LLM Inference UX to aid in creative writing☆106Updated 4 months ago
- ☆38Updated 8 months ago
- A fast batching API to serve LLM models☆172Updated 6 months ago
- 4bit bitsandbytes quants of the best 7B vlms☆21Updated last month
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆173Updated 4 months ago