WebGPU LLM inference tuned by hand
☆150Jun 24, 2023Updated 2 years ago
Alternatives and similar repositories for token-hawk
Users that are interested in token-hawk are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official code for ACL 2023 (short, findings) paper "Recursion of Thought: A Divide and Conquer Approach to Multi-Context Reasoning with L…☆45Jun 13, 2023Updated 2 years ago
- Deploy your GGML models to HuggingFace Spaces with Docker and gradio☆38Jun 6, 2023Updated 2 years ago
- Inference Llama 2 in one file of pure JavaScript(HTML)☆36May 20, 2025Updated 10 months ago
- A guidance language for controlling large language models.☆43Jun 9, 2023Updated 2 years ago
- ggml implementation of BERT☆500Feb 23, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Launch a full-fledged D&D 5e text adventure in seconds. Generate a unique, procedurally crafted world—complete with kingdoms, guilds, and…☆27Updated this week
- A minimal Python re-implementation of the A* with seed heuristic for exact global alignment (edit distance) in near-linear time☆22Nov 30, 2024Updated last year
- Erudito: Easy API/CLI to ask questions about your documentation☆99Nov 6, 2023Updated 2 years ago
- A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…☆312Jan 31, 2024Updated 2 years ago
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rust☆40Aug 2, 2023Updated 2 years ago
- ☆11Oct 11, 2023Updated 2 years ago
- Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)☆569Aug 8, 2023Updated 2 years ago
- ☆13May 7, 2023Updated 2 years ago
- A Next.js chat app to use Llama 2 locally using node-llama-cpp☆12Oct 27, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- C++ implementation for 💫StarCoder☆458Sep 9, 2023Updated 2 years ago
- various experiments for scaling inference time compute with small reasoning models☆17Jan 16, 2025Updated last year
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,885Jan 28, 2024Updated 2 years ago
- Simple script to re-rank images using OpenAI's CLIP https://github.com/openai/CLIP.☆15May 3, 2021Updated 4 years ago
- A fork of textgen that kept some things like Exllama and old GPTQ.☆22Aug 20, 2024Updated last year
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…☆145Oct 17, 2023Updated 2 years ago
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,567Mar 23, 2025Updated last year
- trying to make WebGPU a bit easier to use☆19Jan 9, 2024Updated 2 years ago
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆159Feb 9, 2024Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Minimal C# bindings for llama.cpp + .NET core library with API host/client.☆74Dec 20, 2024Updated last year
- LLM-based code completion engine☆192Jan 23, 2025Updated last year
- A Swift package for interacting with selenium and undetected-chromedriver through python by using PythonKit.☆13Jun 21, 2025Updated 9 months ago
- A repo to hold some simple experiments☆14May 4, 2022Updated 3 years ago
- ☆13May 25, 2023Updated 2 years ago
- ☆15Jun 5, 2023Updated 2 years ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers☆426Dec 20, 2023Updated 2 years ago
- A Javascript library (with Typescript types) to parse metadata of GGML based GGUF files.☆52Jul 30, 2024Updated last year
- A cross-platform browser ML framework.☆756Apr 2, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Download full or partial git-lfs repos without temporarily using 2x disk space☆31Oct 13, 2023Updated 2 years ago
- A Next.js chatbot app demonstrating seamless integration with window.ai.☆15Jun 25, 2023Updated 2 years ago
- Falcon LLM ggml framework with CPU and GPU support☆248Jan 22, 2024Updated 2 years ago
- A fast batching API to serve LLM models☆189Apr 26, 2024Updated last year
- Optimization algorithm which fits a ResNet to CIFAR-10 5x faster than SGD / Adam (with terrible generalization)☆14Oct 20, 2023Updated 2 years ago
- Embeddings focused small version of Llama NLP model☆108Apr 27, 2023Updated 2 years ago
- Newman reporter allowing to decorate pull request with postman collection results.☆10Nov 2, 2023Updated 2 years ago