in this repository, i'm going to implement increasingly complex llm inference optimizations
☆84May 22, 2025Updated 10 months ago
Alternatives and similar repositories for llm-inference-optimizations-explained
Users that are interested in llm-inference-optimizations-explained are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Python module to make coding hassle free!☆10Jun 1, 2021Updated 4 years ago
- Custom triton kernels for training Karpathy's nanoGPT.☆19Oct 21, 2024Updated last year
- ☆15Jan 26, 2025Updated last year
- ☆15Apr 26, 2025Updated 11 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆60Oct 18, 2025Updated 5 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆14Dec 21, 2025Updated 3 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- ☆18May 15, 2025Updated 10 months ago
- A tiny easily hackable implementation of a feature dashboard.☆16Oct 21, 2025Updated 5 months ago
- Project code for training LLMs to write better unit tests + code☆21May 19, 2025Updated 10 months ago
- Following Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish☆171Jul 31, 2024Updated last year
- Collection of resources for RL and Reasoning☆27Feb 3, 2025Updated last year
- Re-implementation of local descriptor HardNet training in fasta2+kornia☆21Apr 6, 2020Updated 6 years ago
- ☆12Sep 25, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Stream of my favorite papers and links☆44Feb 15, 2026Updated 2 months ago
- NanoGPT (124M) in 5 minutes☆15Feb 14, 2025Updated last year
- Jupyter notebooks showing to implement statistical functions.☆14Jun 14, 2020Updated 5 years ago
- ☆12Jan 19, 2024Updated 2 years ago
- Voila! A smart automatic pet feeder using Arduino Uno + RTC time module for scheduling + multiple sensors.☆10Jun 4, 2024Updated last year
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Jul 4, 2025Updated 9 months ago
- Retrieve the source code for any model made available on replicate.com!☆36Jan 22, 2024Updated 2 years ago
- ☆79Nov 26, 2024Updated last year
- All the content of my youtube channel : https://youtube.com/@florenzerstling?si=7t10PBr6MDha74PO☆14May 28, 2025Updated 10 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆470Mar 10, 2025Updated last year
- Open sourced result for The Agent Company☆21Nov 11, 2025Updated 5 months ago
- Simulation of job offers and CVs with real-time processing, classification, and analytics using Kafka, Ray, Spark, and Databricks. Includ…☆14Dec 25, 2024Updated last year
- A light tensor library in zig.☆77Feb 9, 2025Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆61Apr 8, 2024Updated 2 years ago
- ☆29Nov 9, 2025Updated 5 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆62Nov 4, 2024Updated last year
- ☆10Oct 22, 2024Updated last year
- ☆16Apr 29, 2025Updated 11 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- hakken is a coding agent which needs hell lot of context☆31Dec 4, 2025Updated 4 months ago
- Example for exposing MCP servers to Pydantic Agents☆18Mar 16, 2025Updated last year
- Solutions for Stanford CS224n, Winter 2020.☆12Jun 5, 2021Updated 4 years ago
- ☆80Jun 5, 2024Updated last year
- Manage your ever-growing list of research papers☆13Nov 19, 2023Updated 2 years ago
- Mention any three favourite things and get recommendations in the form of a flow chart by Claude Haiku.☆14Apr 6, 2024Updated 2 years ago
- Test equality between a black-box LLM API and a reference distribution☆13Oct 29, 2024Updated last year