An implementation of bucketMul LLM inference
☆228Jul 1, 2024Updated last year
Alternatives and similar repositories for effort
Users that are interested in effort are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆23May 14, 2026Updated 2 weeks ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆280Nov 3, 2023Updated 2 years ago
- Explore training for quantized models☆26Jul 12, 2025Updated 10 months ago
- A GPU Accelerated Binary Vector Store☆47Feb 17, 2025Updated last year
- Experimental method to use reference video to drive motion in generations without training in ComfyUI.☆37Apr 9, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- An implementation of the FAST feature detector that utilizes the ESP32-S3's SIMD instructions.☆47Jun 24, 2024Updated last year
- SPLAA is an AI assistant framework that utilizes voice recognition, text-to-speech, and tool-calling capabilities to provide a conversati…☆29May 6, 2025Updated last year
- An extension for oobabooga's text-generation-webui that adds syntax highlighting to code snippets☆69Jun 4, 2024Updated last year
- Algebraic enhancements for GEMM & AI accelerators☆293Feb 28, 2025Updated last year
- LLM plugin for asking questions of LLM's own documentation, and related packages☆30May 5, 2025Updated last year
- LLM-powered lossless compression tool☆310Jan 2, 2026Updated 4 months ago
- Serving multiple LoRA finetuned LLM as one☆1,160May 8, 2024Updated 2 years ago
- ☆253Mar 20, 2024Updated 2 years ago
- Plugin for LLM adding support for Google's PaLM 2 model☆14Oct 4, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [TMLR'24] This repository includes the official implementation our paper "FedConv: Enhancing Convolutional Neural Networks for Handling D…☆25Apr 30, 2024Updated 2 years ago
- Cybernaut is deprecated in favor of pageobject.js.org☆15Nov 5, 2017Updated 8 years ago
- An implementation of the Llama architecture, to instruct and delight☆21May 31, 2025Updated 11 months ago
- GRadient-INformed MoE☆263Sep 25, 2024Updated last year
- Yet another frontend for LLM, written using .NET and WinUI 3☆11Sep 14, 2025Updated 8 months ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆23Jun 30, 2025Updated 11 months ago
- Package of useful sampling algorithms written in MLX.☆17Feb 27, 2024Updated 2 years ago
- ☆29Jan 23, 2024Updated 2 years ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆254Jun 6, 2025Updated 11 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆24Dec 11, 2024Updated last year
- XmodelLM☆38Nov 19, 2024Updated last year
- ☆24Jan 22, 2025Updated last year
- A CLI to manage install and configure llama inference implemenation in multiple languages☆64Jan 4, 2024Updated 2 years ago
- FlashAttention (Metal Port)☆603Sep 22, 2024Updated last year
- Go ahead and axolotl questions☆11,964Updated this week
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆25Nov 25, 2024Updated last year
- the small distributed language model toolkit; fine-tune state-of-the-art LLMs anywhere, rapidly☆32Oct 19, 2024Updated last year
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆226Sep 18, 2025Updated 8 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A series of top performing Text to SQL LLMs☆865Feb 12, 2024Updated 2 years ago
- PB-LLM: Partially Binarized Large Language Models☆156Nov 20, 2023Updated 2 years ago
- Slipstream provides a data-flow model to simplify development of stateful streaming applications.☆39Feb 19, 2026Updated 3 months ago
- A basic ls replacement, written in rust, using cursor ai and Geoffrey Huntley's techniques☆32Mar 3, 2025Updated last year
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Nov 11, 2024Updated last year
- This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?☆1,549Nov 13, 2025Updated 6 months ago
- ☆23May 12, 2026Updated 2 weeks ago