Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm
☆171May 4, 2026Updated last month
Alternatives and similar repositories for ipex-llm-tutorial
Users that are interested in ipex-llm-tutorial are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 本项目借助飞桨平台,构建起一套创新的多模型协同系统,实现 PDF 文件到 Markdown 文件的高效、精准转换。☆28Mar 25, 2025Updated last year
- This sample shows how to use the oneAPI Video Processing Library (oneVPL) to perform a single and multi-source video decode and preproces…☆15Jun 15, 2023Updated 2 years ago
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆595Updated this week
- 📚 Jupyter notebook tutorials for OpenVINO™☆3,156Jun 1, 2026Updated last week
- This is Microsoft-Phi-3-NvidiaNIMWorkshop☆22Aug 16, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- An abstraction library for building domain-specific intelligent agents based on Large Language Models (LLMs). LLMAgent provides a core ar…☆27Feb 5, 2026Updated 4 months ago
- 小红书多账号管理☆13Jul 24, 2025Updated 10 months ago
- Another ChatGLM2 implementation for GPTQ quantization☆55Oct 15, 2023Updated 2 years ago
- Multi-Agent LLM System for Digital Scam Protection☆15Dec 19, 2024Updated last year
- An OpenAI API compatible images server to generate or manipulate images.☆18Feb 2, 2025Updated last year
- ☆20Feb 10, 2025Updated last year
- Image Search Engine with HuggingFace Sentence Transformer☆12Aug 31, 2023Updated 2 years ago
- ☆16Dec 16, 2024Updated last year
- Code Repository for Blog - How to Productionize Large Language Models (LLMs)☆12Mar 27, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- My implementation (PyTorch) for the paper SST: Single-Stream Temporal Action Proposals (http://vision.stanford.edu/pdf/buch2017cvpr.pdf).☆10Dec 8, 2022Updated 3 years ago
- pytorch code examples for measuring the performance of collective communication calls in AI workloads☆21Sep 18, 2025Updated 8 months ago
- ☆14Apr 22, 2024Updated 2 years ago
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- Turn PostgreSQL into your search engine in a Pythonic way.☆52Aug 29, 2025Updated 9 months ago
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platform☆2,014Mar 30, 2026Updated 2 months ago
- ☆20Feb 18, 2025Updated last year
- Intel® Tensor Processing Primitives extension for Pytorch*☆19May 29, 2026Updated 2 weeks ago
- Kexplain is an interactive kubectl explain☆12Oct 23, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- One-Click to deploy your own GPT web UI.☆10Sep 16, 2024Updated last year
- A simple no-install web UI for Ollama and OAI-Compatible APIs!☆31Jan 30, 2025Updated last year
- Synthetic data for fine tuning LLM☆27Dec 26, 2024Updated last year
- Official implementation "ChipNet: Budget-Aware Pruning with Heaviside Continuous Approximations"☆21Oct 29, 2022Updated 3 years ago
- My Interview recording repo.☆11Mar 22, 2023Updated 3 years ago
- Building reliable Retrieval Augmented Generation(RAG) AI Architecture☆13Jul 30, 2024Updated last year
- A tool to help you to copy an AMI from your Worldwide AWS account to China account.☆11Sep 16, 2023Updated 2 years ago
- Cyclone Jet Rocket is a DDoS tool for System Security Technology course☆11Jun 5, 2017Updated 9 years ago
- ProxQuant: Quantized Neural Networks via Proximal Operators☆30Feb 19, 2019Updated 7 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆30May 18, 2025Updated last year
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆14Mar 30, 2024Updated 2 years ago
- SIGCOMM 2021 artifact☆12Jul 27, 2024Updated last year
- U-PG-RAG, a campus Q&A backend system, is built on PostgreSQL (PG).U-PG-RAG是一个基于postgresql(PG)构建的校园问答后端系统☆10Oct 15, 2024Updated last year
- ☆14Mar 30, 2026Updated 2 months ago
- An Erlang ingester for GreptimeDB, which is compatible with GreptimeDB protocol and lightweight.☆16Apr 17, 2026Updated last month
- TensorRT-in-Action 是一个 GitHub 代码库,提供了使用 TensorRT 的代码示例,并有对应 Jupyter Notebook。☆15Jun 1, 2023Updated 3 years ago