☆100Aug 30, 2024Updated last year
Alternatives and similar repositories for How_Much_VRAM
Users that are interested in How_Much_VRAM are comparing it to the libraries listed below
Sorting:
- A tiny, didactical implementation of LLAMA 3☆42Dec 2, 2024Updated last year
- RAG example using DSPy, Gradio, FastAPI☆92Apr 11, 2024Updated last year
- Model compression for ONNX☆100Feb 19, 2026Updated last week
- tiny_fnc_engine is a minimal python library that provides a flexible engine for calling functions extracted from a LLM.☆38Sep 11, 2024Updated last year
- A simple LLaMA implementation using MLX.☆15Apr 22, 2024Updated last year
- An RAG (retrieval augmented generation) app which iterates through a PDF document and can answer user's questions based on the document u…☆16Mar 23, 2025Updated 11 months ago
- Run AI models anywhere. https://muna.ai/explore☆83Updated this week
- Resources regarding evML (edge verified machine learning)☆22Jan 4, 2025Updated last year
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Updated this week
- Quickly switch between different Claude Code providers☆57Feb 6, 2026Updated 3 weeks ago
- ChatGPT research repository☆17Feb 18, 2023Updated 3 years ago
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"☆21Jul 31, 2023Updated 2 years ago
- List of papers on Self-Correction of LLMs.☆80Dec 28, 2024Updated last year
- Example code using the DSPy framework.☆20May 30, 2024Updated last year
- This is a simple Notion Page Assistant that uses OpenAI's functions to create a Notion page.☆19Jul 28, 2023Updated 2 years ago
- A pipeline that accurately simulates high quality publicly cancer genomes (VCFs, CNAs and SVs).☆35Feb 6, 2026Updated 3 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Dec 4, 2025Updated 2 months ago
- automatically quant GGUF models☆220Dec 23, 2025Updated 2 months ago
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆100Jun 29, 2025Updated 8 months ago
- AI Multi-agent system for real-time, adaptive supply chain coordination and optimization leveraging responsive AI clusters.☆36Mar 28, 2024Updated last year
- ☆23Jul 10, 2023Updated 2 years ago
- FastAPI wrapper around DSPy☆292Mar 11, 2024Updated last year
- ☆209Feb 7, 2025Updated last year
- ☆21Apr 17, 2025Updated 10 months ago
- Examples and Demos using the Cohere APIs☆23Nov 3, 2023Updated 2 years ago
- Complete automated setup guide for Qwen3-Coder-480B-A35B-Instruct model installation on Ubuntu with NVIDIA GPUs☆44Aug 3, 2025Updated 6 months ago
- Python Server for C3 AI app. A project that brings the power of Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) with…☆24Jan 7, 2024Updated 2 years ago
- ☆28Apr 14, 2024Updated last year
- ☆119Dec 18, 2024Updated last year
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆61Oct 3, 2024Updated last year
- Revision of official yolov7-pose to support custom dataset for keypoint detection☆11Nov 12, 2023Updated 2 years ago
- ☆67Mar 30, 2025Updated 11 months ago
- ☆29Jul 4, 2025Updated 7 months ago
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆350Apr 27, 2025Updated 10 months ago
- ⛓️ build cognitive systems, pythonic☆339Nov 19, 2024Updated last year
- Large Language Models (LLMs) applications and tools running on Apple Silicon in real-time with Apple MLX.☆459Jan 29, 2025Updated last year
- ☆24Nov 27, 2024Updated last year
- Rag Chatbot React And Tyepscript base boilerplate☆32Apr 14, 2024Updated last year
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,108Updated this week