mlc-ai / web-llm-assistant
AI Assistant running within your browser.
☆40Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for web-llm-assistant
- ☆61Updated last week
- KV cache compression for high-throughput LLM inference☆82Updated last week
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 5 months ago
- Fast Inference of MoE Models with CPU-GPU Orchestration☆170Updated 2 weeks ago
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆70Updated last week
- Modular and structured prompt caching for low-latency LLM inference☆65Updated this week
- Deploy your autonomous agents to production grade environments with 99% Uptime Guarantee, Infinite Scalability, and self-healing.☆27Updated this week
- ☆114Updated 6 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆29Updated 6 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆89Updated this week
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆73Updated 3 weeks ago
- Implementation of nougat that focuses on processing pdf locally.☆73Updated 6 months ago
- A collection of pre-build wrappers over common RAG systems like ChromaDB, Weaviate, Pinecone, and othersz!☆20Updated this week
- Structured inference with Llama 2 in your browser☆51Updated last week
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆28Updated 8 months ago
- ☆43Updated 3 months ago
- never forget anything again! combine AI and intelligent tooling for a local knowledge base to track catalogue, annotate, and plan for you…☆32Updated 5 months ago
- ☆40Updated 6 months ago
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆99Updated last week
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆36Updated last year
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆184Updated last month
- Inference code for LLaMA models☆38Updated last year
- Apps that run on modal.com☆12Updated 5 months ago
- A collection of all available inference solutions for the LLMs☆72Updated last month
- ☆96Updated last month
- A function to do all☆35Updated 6 months ago
- QuIP quantization☆46Updated 7 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆85Updated 3 weeks ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆110Updated 4 months ago