groq / groqflow
GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing those programs on GroqChip™ processors.
☆109Updated 2 months ago
Alternatives and similar repositories for groqflow
Users that are interested in groqflow are comparing it to the libraries listed below
Sorting:
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 11 months ago
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆39Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆52Updated last year
- ☆89Updated 7 months ago
- AskIt: Unified programming interface for programming with LLMs (GPT-3.5, GPT-4, Gemini, Claude, Cohere, Llama 2)☆79Updated 4 months ago
- AI Assistant running within your browser.☆65Updated 5 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated this week
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆145Updated last year
- 1.58-bit LLaMa model☆81Updated last year
- A Python library to orchestrate LLMs in a neural network-inspired structure☆47Updated 7 months ago
- A tree-based prefix cache library that allows rapid creation of looms: hierarchal branching pathways of LLM generations.☆68Updated 3 months ago
- ☆33Updated 2 months ago
- Distributed Inference for mlx LLm☆91Updated 9 months ago
- Fast parallel LLM inference for MLX☆187Updated 10 months ago
- Transformer GPU VRAM estimator☆61Updated last year
- ☆66Updated 11 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆33Updated this week
- A GPT with self-similar nested properties☆20Updated last year
- LLM inference in C/C++☆76Updated this week
- ☆113Updated 4 months ago
- A fast minimalistic implementation of guided generation on Apple Silicon using Outlines and MLX☆53Updated last year
- ☆118Updated 9 months ago
- Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"☆140Updated last year
- A guidance compatibility layer for llama-cpp-python☆34Updated last year
- Tutorial to get started with SkyPilot!☆57Updated 11 months ago
- API Server for Transformer Lab☆61Updated this week
- A repository of prompts and Python scripts for intelligent transformation of raw text into diverse formats.☆30Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!☆44Updated last week
- Editor with LLM generation tree exploration☆66Updated 3 months ago
- Custom PTX Instruction Benchmark☆123Updated 2 months ago