WebGPU LLM inference tuned by hand
☆150Jun 24, 2023Updated 2 years ago
Alternatives and similar repositories for token-hawk
Users that are interested in token-hawk are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official code for ACL 2023 (short, findings) paper "Recursion of Thought: A Divide and Conquer Approach to Multi-Context Reasoning with L…☆45Jun 13, 2023Updated 2 years ago
- Deploy your GGML models to HuggingFace Spaces with Docker and gradio☆38Jun 6, 2023Updated 2 years ago
- A guidance language for controlling large language models.☆43Jun 9, 2023Updated 2 years ago
- A minimal metal application☆14Mar 24, 2021Updated 5 years ago
- A text-based, 5e-compatible RPG with an AI Dungeon Master that rolls real dice, tracks real stats, and plays by the rules. Built on the S…☆32May 18, 2026Updated last week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ggml implementation of BERT☆500Feb 23, 2024Updated 2 years ago
- Erudito: Easy API/CLI to ask questions about your documentation☆98Nov 6, 2023Updated 2 years ago
- A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…☆312Jan 31, 2024Updated 2 years ago
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rust☆40Aug 2, 2023Updated 2 years ago
- ☆11Oct 11, 2023Updated 2 years ago
- Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)☆572Aug 8, 2023Updated 2 years ago
- Makes llama.cpp easy to use.☆12May 14, 2025Updated last year
- A Next.js chat app to use Llama 2 locally using node-llama-cpp☆12Oct 27, 2024Updated last year
- C++ implementation for 💫StarCoder☆458Sep 9, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Train llama with lora on one 4090 and merge weight of lora to work as stanford alpaca.☆52Jun 16, 2023Updated 2 years ago
- various experiments for scaling inference time compute with small reasoning models☆17Jan 16, 2025Updated last year
- ☆34May 28, 2023Updated 2 years ago
- A Clojure library to automatically generate public API namespaces by wrapping and exposing functions and macros from implementation names…☆13Mar 14, 2025Updated last year
- Simple script to re-rank images using OpenAI's CLIP https://github.com/openai/CLIP.☆15May 3, 2021Updated 5 years ago
- A fork of textgen that kept some things like Exllama and old GPTQ.☆22Aug 20, 2024Updated last year
- Fast Clojure interpreter and template engine☆14Oct 15, 2021Updated 4 years ago
- minimal diffusion transformer in pytorch.☆17Oct 6, 2024Updated last year
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…☆145Oct 17, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,887Jan 28, 2024Updated 2 years ago
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆159Feb 9, 2024Updated 2 years ago
- Automated testing of browser rendering engines using Clojure, generative (property-based) testing, formal grammars (EBNF), and a consensu…☆12Nov 18, 2019Updated 6 years ago
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,569Mar 23, 2025Updated last year
- Natural Language Processing Plugin for Unreal Engine 4 using Tensorflow☆10Jun 16, 2019Updated 6 years ago
- A minimal Python re-implementation of the A* with seed heuristic for exact global alignment (edit distance) in near-linear time☆22Nov 30, 2024Updated last year
- LLM-based code completion engine☆192Jan 23, 2025Updated last year
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆42Mar 13, 2023Updated 3 years ago
- Plug n Play GBNF Compiler for llama.cpp☆32Nov 8, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Emacs package for LLM-assisted code/text completion☆42Nov 12, 2025Updated 6 months ago
- A working and hopefully fast SUBLEQ emulator to run DawnOS☆11Sep 19, 2019Updated 6 years ago
- ☆14May 25, 2023Updated 2 years ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers☆425Dec 20, 2023Updated 2 years ago
- A cross-platform browser ML framework.☆759Apr 2, 2026Updated last month
- Download full or partial git-lfs repos without temporarily using 2x disk space☆32Oct 13, 2023Updated 2 years ago
- Builds Dawn on Linux and macOS as one single easier-to-use library☆29Dec 5, 2021Updated 4 years ago