pacman100 / mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

☆11

Related projects ⓘ

Alternatives and complementary repositories for mlc-llm

mostlygeek / llama-swap
HTTP proxy for on-demand model loading with llama.cpp (or other OpenAI compatible backends)
☆41Updated this week
togethercomputer / redpajama.cpp
Extend the original llama.cpp repo to support redpajama model.
☆117Updated 2 months ago
Nondzu / LlamaTor
LlamaTor: Decentralized AI model sharing via BitTorrent for efficient, user-friendly distribution and collaboration.
☆40Updated 5 months ago
arcee-ai / PruneMe
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆196Updated 6 months ago
chrisociepa / allamo
Simple, hackable and fast implementation for training/finetuning medium-sized LLaMA-based models
☆153Updated this week
gpustack / gguf-parser-go
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
☆47Updated this week
nyunAI / PruneGPT
☆53Updated 5 months ago
microsoft / GRIN-MoE
GRadient-INformed MoE
☆259Updated last month
EQ-bench / EQ-Bench
A benchmark for emotional intelligence in large language models
☆197Updated 3 months ago
tdrussell / qlora-pipe
A pipeline parallel training script for LLMs.
☆83Updated this week
uukuguy / multi_loras
Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…
☆142Updated 9 months ago
Gryphe / MergeMonster
An unsupervised model merging algorithm for Transformers-based language models.
☆100Updated 6 months ago
janhq / cortex.tensorrt-llm
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…
☆40Updated last month
matteoserva / GraphLLM
☆128Updated this week
PrimeIntellect-ai / OpenDiloco
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
☆317Updated last month
statchamber / ebook-to-chatml-conversion
idea: https://github.com/nyxkrage/ebook-groupchat/
☆82Updated 3 months ago
sumo43 / loopvlm
run paligemma in real time
☆123Updated 6 months ago
multiplexerai / Complex-to-Simple-RAG
☆39Updated 9 months ago
thomasgauthier / LoRD
Low-Rank adapter extraction for fine-tuned transformers model
☆162Updated 6 months ago
Contextualist / lone-arena
Self-hosted LLM chatbot arena, with yourself as the only judge
☆36Updated 9 months ago
beratcmn / local-intelligence
Something similar to Apple Intelligence?
☆57Updated 4 months ago
SomeOddCodeGuy / WilmerAI
A python application that routes incoming prompts to an LLM by category, and can support a single incoming connection from a front end to…
☆171Updated this week
Locutusque / TinyMistral-train-eval
The training notebooks that were similar to the original script used to train TinyMistral.
☆19Updated 11 months ago
justinchuby / onnx-safetensors
Use safetensors with ONNX 🤗
☆26Updated 2 months ago
microsoft / FILM
Official repo for "Make Your LLM Fully Utilize the Context"
☆243Updated 6 months ago
the-crypt-keeper / LLooM
Experimental LLM Inference UX to aid in creative writing
☆106Updated 4 months ago
shirley-wu / cot_decoding
☆38Updated 8 months ago
epolewski / EricLLM
A fast batching API to serve LLM models
☆172Updated 6 months ago
cyan2k / molmo-7b-bnb-4bit
4bit bitsandbytes quants of the best 7B vlms
☆21Updated last month
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆173Updated 4 months ago