dmatora / LLM-inference-speed-benchmarksLinks
☆20Updated last year
Alternatives and similar repositories for LLM-inference-speed-benchmarks
Users that are interested in LLM-inference-speed-benchmarks are comparing it to the libraries listed below
Sorting:
- Trying to deconstruct RWKV in understandable terms☆14Updated 2 years ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆86Updated last week
- AirLLM 70B inference with single 4GB GPU☆14Updated 6 months ago
- Experimental sampler to make LLMs more creative☆31Updated 2 years ago
- Modified Beam Search with periodical restart☆12Updated last year
- This repository is about implementing The Personality Cores Conversation System originally developed by Aperture Science, Inc. in the Por…☆25Updated last year
- run ollama & gguf easily with a single command☆52Updated last year
- ☆24Updated 11 months ago
- ☆74Updated 2 years ago
- An unsupervised model merging algorithm for Transformers-based language models.☆108Updated last year
- Senna is an advanced AI-powered search engine designed to provide users with immediate answers to their queries by leveraging natural lan…☆19Updated last year
- ☆27Updated 2 years ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆33Updated 2 months ago
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated 2 years ago
- Controllable Language Model Interactions in TypeScript☆10Updated last year
- OpenPipe Reinforcement Learning Experiments☆32Updated 9 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated last year
- 4 bits quantization of SantaCoder using GPTQ☆51Updated 2 years ago
- ☆62Updated 6 months ago
- ☆51Updated last year
- Experiments with BitNet inference on CPU☆55Updated last year
- llama.cpp to PyTorch Converter☆35Updated last year
- AI Based "Happiness Optimizer"☆12Updated last year
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆70Updated 2 years ago
- Transformer GPU VRAM estimator☆67Updated last year
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆29Updated 9 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated last year
- Accepts a Hugging Face model URL, automatically downloads and quantizes it using Bits and Bytes.☆38Updated last year
- JAX implementations of RWKV☆19Updated 2 years ago
- Who needs o1 anyways. Add CoT to any OpenAI compatible endpoint.☆44Updated last year