h9-tec/Qwen_MOE_C

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/h9-tec/Qwen_MOE_C)

h9-tec / Qwen_MOE_C

☆43

Alternatives and similar repositories for Qwen_MOE_C

Users that are interested in Qwen_MOE_C are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

gigit0000 / qwen3.c
View on GitHub
Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.
☆25Sep 1, 2025Updated 10 months ago
freebasic / fbbindings
View on GitHub
Scripts for fbfrog-based FreeBASIC bindings
☆16Mar 16, 2025Updated last year
reinterpretcat / qwen3-rs
View on GitHub
An educational Rust project for exporting and running inference on Qwen3 LLM family
☆44Aug 3, 2025Updated 11 months ago
gigit0000 / qwen3.cu
View on GitHub
Single-file, pure CUDA C implementation for running inference on Qwen3 0.6B GGUF. No Dependencies.
☆24Nov 26, 2025Updated 8 months ago
bsamud / openfoundry-agentic-framework
View on GitHub
Multi-agent orchestration framework for AI applications - build, deploy, and manage AI agents across the full lifecycle with Forge, Conve…
☆33Mar 28, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
adriancable / qwen3.c
View on GitHub
Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.
☆184Jul 5, 2025Updated last year
Zhayr1 / bitmamba.cpp
View on GitHub
Ultra-lightweight C++ inference engine for BitMamba-2 (1.58-bit SSM). Runs 1B models on consumer CPUs at 50+ tok/s using <700MB RAM. No h…
☆21Jun 2, 2026Updated last month
pierrel55 / llama_st
View on GitHub
Load and run Llama from safetensors files in C
☆15Oct 24, 2024Updated last year
DSTCyber / safe-deobs
View on GitHub
A static deobfuscator for JavaScript Malware
☆13May 6, 2020Updated 6 years ago
zhebrak / agtap
View on GitHub
Zero-instrumentation LLM API and MCP tracer for your agents powered by eBPF — latency, tokens, and tool use in realtime
☆18Mar 16, 2026Updated 4 months ago
Belluxx / Perplex
View on GitHub
Inspect LLM's logprobs and perplexity over a piece of text, or compare two LLMs (like a git diff)
☆20Mar 23, 2026Updated 4 months ago
core-stack / snipet
View on GitHub
☆18Dec 1, 2025Updated 7 months ago
mscheong01 / speculative_decoding.c
View on GitHub
minimal C implementation of speculative decoding based on llama2.c
☆30Jul 15, 2024Updated 2 years ago
yassa9 / frokenizer
View on GitHub
A zero-allocation, header-only C++ BPE tokenizer for Qwen, built for maximum inference throughput.
☆22Apr 3, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
dspasyuk / llama.cui
View on GitHub
Llama.cui is a small llama.cpp-based chat application for Node.js
☆19Jul 10, 2025Updated last year
AI21Labs / ai21-typescript
View on GitHub
AI21 Typescript SDK
☆13Dec 18, 2025Updated 7 months ago
eauchs / mlx-dflash
View on GitHub
3.34× faster inference on Apple Silicon — native MLX port of DFlash speculative decoding
☆19Apr 11, 2026Updated 3 months ago
Danmoreng / qwen3-tts.cpp
View on GitHub
☆37Jul 13, 2026Updated 2 weeks ago
fajrmn / kokoro-on-browser
View on GitHub
☆16Feb 1, 2025Updated last year
ZihaoFU245 / lmstudio-toolpack
View on GitHub
A MCP stdio toolpack for local LLMs
☆33Apr 6, 2026Updated 3 months ago
ariannamethod / nanollama
View on GitHub
Train Llama 3 models from scratch. Any scale, any personality. By Arianna Method.
☆49May 4, 2026Updated 2 months ago
filippostanghellini / DocFinder
View on GitHub
DocFinder is a local-first indexing and searching documents using semantic embeddings stored in SQLite. Everything runs on your machine, …
☆26Updated this week
WuKongAI-CMU / Kimi-K2-Mini
View on GitHub
A miniaturized version of the Kimi-K2 model optimized for deployment on single H100 GPUs.
☆35Jul 16, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Felix-Zhenghao / flash-attention-v2-minimal
View on GitHub
Implement FlashAttention v2 with minimal code to learn.
☆19Jun 12, 2024Updated 2 years ago
abhisheknair10 / llama3.cu
View on GitHub
Lightweight Llama 3 8B Inference Engine in CUDA C
☆53Mar 21, 2025Updated last year
TAR-ALEX / llm-html
View on GitHub
☆20Jul 4, 2025Updated last year
yvonwin / qwen2.cpp
View on GitHub
qwen2 and llama3 cpp implementation
☆50Jun 7, 2024Updated 2 years ago
parsakhaz / fashn-tryon-extension
View on GitHub
A Chrome extension that enables virtual fashion try-on and model swap using FASHN AI. Hover over fashion images on any website to: (1) tr…
☆22Aug 14, 2025Updated 11 months ago
RhinoDevel / mt_llm
View on GitHub
Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.
☆15Updated this week
NachiketGadekar1 / browserllama
View on GitHub
Browser extension that lets you summarize and chat with any webpage using a local LLM of your choice.
☆23Oct 24, 2024Updated last year
phildougherty / qwen2.5_omni_chat
View on GitHub
Service for testing out the new Qwen2.5 omni model
☆62Apr 30, 2025Updated last year
rastandy / xwiki-ansible-playbook
View on GitHub
Ansible playbook to install XWiki with PostgreSQL and Tomcat
☆12Oct 10, 2017Updated 8 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
intelligencedev / eternal
View on GitHub
Eternal is an experimental platform for machine learning models and workflows.
☆70Mar 9, 2025Updated last year
Brymir7 / PhaeroOS
View on GitHub
AI Based "Happiness Optimizer"
☆12Oct 20, 2024Updated last year
cztomsik / ggml-js
View on GitHub
JavaScript bindings for the ggml-js library
☆44Nov 10, 2025Updated 8 months ago
hyparam / hyllama
View on GitHub
llama.cpp gguf file parser for javascript
☆50Dec 11, 2024Updated last year
wizzard0 / llama2.ts
View on GitHub
Llama2 inference in one TypeScript file
☆20May 29, 2025Updated last year
Laszlobeer / Dungeo_ai_lan_play
View on GitHub
this is a dungeon ai run locally that use your llm in the terminal with multiple players from 2 to 5
☆17Jan 25, 2026Updated 6 months ago
ahkohd / yagami
View on GitHub
A local-first web search agent
☆29Jun 20, 2026Updated last month