gigit0000 / qwen3.cuLinks
Single-file, pure CUDA C implementation for running inference on Qwen3 0.6B GGUF. No Dependencies.
☆22Updated last month
Alternatives and similar repositories for qwen3.cu
Users that are interested in qwen3.cu are comparing it to the libraries listed below
Sorting:
- Simple LLM inference server☆20Updated last year
- OpenPipe Reinforcement Learning Experiments☆32Updated 9 months ago
- ☆50Updated last year
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Updated last year
- Trying to deconstruct RWKV in understandable terms☆14Updated 2 years ago
- ☆62Updated 5 months ago
- Who needs o1 anyways. Add CoT to any OpenAI compatible endpoint.☆44Updated last year
- A library for incremental loading of large PyTorch checkpoints☆56Updated 2 years ago
- Demo of an "always-on" AI assistant.☆24Updated last year
- Very minimal (and stateless) agent framework☆44Updated 11 months ago
- 🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform☆38Updated last year
- Modified Beam Search with periodical restart☆12Updated last year
- A Python library to orchestrate LLMs in a neural network-inspired structure☆52Updated last year
- Python examples using the bigcode/tiny_starcoder_py 159M model to generate code☆45Updated 2 years ago
- Experimental sampler to make LLMs more creative☆31Updated 2 years ago
- Pivotal Token Search☆141Updated last week
- Editor with LLM generation tree exploration☆80Updated 10 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆58Updated last year
- Clue inspired puzzles for testing LLM deduction abilities☆45Updated 9 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆66Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Updated 2 months ago
- Local LLM inference & management server with built-in OpenAI API☆31Updated last year
- BlinkDL's RWKV-v4 running in the browser☆47Updated 2 years ago
- Attend - to what matters.☆17Updated 10 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆22Updated last year
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆55Updated 4 months ago
- ☆24Updated 11 months ago
- Generate a llama-quantize command to copy the quantization parameters of any GGUF☆28Updated 4 months ago
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆103Updated last year
- Yet Another (LLM) Web UI, made with Gemini☆12Updated last year