WebGPU LLM inference tuned by hand
☆150Jun 24, 2023Updated 3 years ago
Alternatives and similar repositories for token-hawk
Users that are interested in token-hawk are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official code for ACL 2023 (short, findings) paper "Recursion of Thought: A Divide and Conquer Approach to Multi-Context Reasoning with L…☆45Jun 13, 2023Updated 3 years ago
- Deploy your GGML models to HuggingFace Spaces with Docker and gradio☆38Jun 6, 2023Updated 3 years ago
- Inference Llama 2 in one file of pure JavaScript(HTML)☆36May 20, 2025Updated last year
- A guidance language for controlling large language models.☆43Jun 9, 2023Updated 3 years ago
- A minimal metal application☆14Mar 24, 2021Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A text-based, 5e-compatible RPG with an AI Dungeon Master that rolls real dice, tracks real stats, and plays by the rules. Built on the S…☆39Jun 17, 2026Updated 2 weeks ago
- ggml implementation of BERT☆501Feb 23, 2024Updated 2 years ago
- Erudito: Easy API/CLI to ask questions about your documentation☆98Nov 6, 2023Updated 2 years ago
- A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…☆312Jan 31, 2024Updated 2 years ago
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rust☆40Aug 2, 2023Updated 2 years ago
- Suno AI's Bark model in C/C++ for fast text-to-speech generation☆866Nov 16, 2024Updated last year
- ☆11Oct 11, 2023Updated 2 years ago
- Makes llama.cpp easy to use.☆12May 14, 2025Updated last year
- A Next.js chat app to use Llama locally using node-llama-cpp☆12Oct 27, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- C++ implementation for 💫StarCoder☆458Sep 9, 2023Updated 2 years ago
- various experiments for scaling inference time compute with small reasoning models☆17Jan 16, 2025Updated last year
- ☆34May 28, 2023Updated 3 years ago
- A fork of textgen that kept some things like Exllama and old GPTQ.☆22Aug 20, 2024Updated last year
- minimal diffusion transformer in pytorch.☆17Oct 6, 2024Updated last year
- A lightweight Python utility that aggregates and exports comprehensive system information to JSON, specifically designed for feeding syst…☆13Apr 13, 2025Updated last year
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…☆144Oct 17, 2023Updated 2 years ago
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,886Jan 28, 2024Updated 2 years ago
- trying to make WebGPU a bit easier to use☆19Jan 9, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆161Feb 9, 2024Updated 2 years ago
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,574Mar 23, 2025Updated last year
- Efficient 3bit/4bit quantization of LLaMA models☆18May 18, 2023Updated 3 years ago
- ☆457Oct 15, 2023Updated 2 years ago
- LLM-based code completion engine☆195Jan 23, 2025Updated last year
- A Swift package for interacting with selenium and undetected-chromedriver through python by using PythonKit.☆13Jun 21, 2025Updated last year
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆42Mar 13, 2023Updated 3 years ago
- Emacs package for LLM-assisted code/text completion☆43May 22, 2026Updated last month
- ☆16Dec 16, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A working and hopefully fast SUBLEQ emulator to run DawnOS☆11Sep 19, 2019Updated 6 years ago
- A repo to hold some simple experiments☆14May 4, 2022Updated 4 years ago
- ☆14May 25, 2023Updated 3 years ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers☆426Dec 20, 2023Updated 2 years ago
- A Javascript library (with Typescript types) to parse metadata of GGML based GGUF files.☆52Jul 30, 2024Updated last year
- Download full or partial git-lfs repos without temporarily using 2x disk space☆32Oct 13, 2023Updated 2 years ago
- A cross-platform browser ML framework.☆767May 26, 2026Updated last month