An end-to-end pipeline to optimize and host LLM for 100K parallel queries
☆36Jul 6, 2025Updated 9 months ago
Alternatives and similar repositories for llm-scale-deploy-guide
Users that are interested in llm-scale-deploy-guide are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of 12 AI agents evaluation techniques☆43Jul 31, 2025Updated 9 months ago
- A straightforward explanation of how DeepSeek R1 works☆18Feb 7, 2025Updated last year
- A Step-by-Step Implementation of RAPTOR based RAG implementation☆40Sep 1, 2025Updated 8 months ago
- Official implementation of CoNSAL for analytical Lyapunov function discovery☆12Jun 26, 2024Updated last year
- Code for the paper: Probabilistic Forecasting with Stochastic Interpolants and Follmer Processes (generative AI for scientific applicatio…☆19Aug 18, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- An LLM-based Multi-Agent Framework for Financial Crime & Suspicious Matter Reporting☆13Apr 28, 2024Updated 2 years ago
- Python CFFI Binding around SuiteSparse:GraphBLAS☆24Updated this week
- Understanding Large Language Transformer Architecture like a child☆31Apr 3, 2024Updated 2 years ago
- The Python Implementation of CRISP: Clustering Multi-Vector Representations for Denoising and Pruning☆27Jul 27, 2025Updated 9 months ago
- language models toolkits with hierarchical softmax setting☆17Mar 23, 2018Updated 8 years ago
- Car Damage Detection: A computer vision project using YOLOv8 and Faster R-CNN to identify and localize car body defects like scratches, d…☆20Jul 23, 2025Updated 9 months ago
- Constrained Decoding of Diffusion LLMs with Context-Free Grammars.☆48Dec 17, 2025Updated 4 months ago
- A detail Implementation of handling long-term memory in Agentic AI☆46Oct 9, 2025Updated 6 months ago
- Implementation of contextual engineering pipeline with LangChain and LangGraph Agents☆89Jul 29, 2025Updated 9 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Optimizing Dynamic Knowledge Base Using AI Agent☆89Aug 13, 2025Updated 8 months ago
- Notes about the video on the Variational Autoencoder☆14Jun 7, 2023Updated 2 years ago
- A Step-by-Step Implementation of Google Veo 3 Architecture from Scratch☆83Jun 16, 2025Updated 10 months ago
- ☆19Apr 26, 2024Updated 2 years ago
- Reasoning-based Evaluation and Ranking of Translations.☆20Jul 18, 2025Updated 9 months ago
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"☆30Jun 23, 2025Updated 10 months ago
- ☆21Jul 23, 2025Updated 9 months ago
- A lightweight, type-safe workflow engine for TypeScript that helps you create flexible, graph-based execution flows☆27Jun 24, 2025Updated 10 months ago
- Phase field model for material science applications.☆30Mar 4, 2019Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆29Mar 15, 2025Updated last year
- Embedding language models in probability space via log-likelihood vectors☆17Apr 22, 2026Updated last week
- Jupyter Notebook with GPU and Code Server!☆22Feb 25, 2024Updated 2 years ago
- ☆77Dec 3, 2024Updated last year
- 📏 Rule-based linter for structured Markdown documents☆30Updated this week
- Manthan for Boolean function synthesis☆35Feb 16, 2026Updated 2 months ago
- Kafka With Python☆29Mar 19, 2022Updated 4 years ago
- ☆30Jun 5, 2025Updated 10 months ago
- Tutorials for Triton, a language for writing gpu kernels☆79Aug 23, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Copy My Writing is a command-line tool for generating content based on your personal writing style.☆11Oct 12, 2025Updated 6 months ago
- Measuring Thinking Efficiency in Reasoning Models - Research Repository☆39Dec 2, 2025Updated 5 months ago
- Flow control nodes for comfyUI, allowing for more diverse workflows☆13Apr 3, 2025Updated last year
- LLM that can be trained on 1 or more GPUs for research.☆42Apr 5, 2026Updated 3 weeks ago
- dsxkline 支持基本功能,滚动缩放滑动分页实时刷新,支持MA,BOLL、VOL、KDJ、MACD、RSI、WR、CCI、BIAS、PSY等指标,支持web,H5,iOS,android,flutter,C#等☆14Apr 1, 2023Updated 3 years ago
- ☆49Jan 8, 2026Updated 3 months ago
- A Python package for advanced audio processing, enabling users to mix, master, and apply sound engineering techniques with ease☆18Apr 22, 2026Updated last week