NimbleEdge / sparse_transformersLinks
Sparse Inferencing for transformer based LLMs
☆187Updated this week
Alternatives and similar repositories for sparse_transformers
Users that are interested in sparse_transformers are comparing it to the libraries listed below
Sorting:
- ☆130Updated last month
- InferX is a Inference Function as a Service Platform☆114Updated this week
- ☆78Updated this week
- LLM Inference on consumer devices☆119Updated 3 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆62Updated 5 months ago
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆436Updated last month
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆154Updated last year
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆31Updated 2 months ago
- AI management tool☆117Updated 7 months ago
- ☆155Updated 2 months ago
- ☆79Updated last week
- klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs☆77Updated 9 months ago
- ☆145Updated last month
- Easily view and modify JSON datasets for large language models☆76Updated last month
- Dia-JAX: A JAX port of Dia, the text-to-speech model for generating realistic dialogue from text with emotion and tone control.☆27Updated last month
- Lightweight Inference server for OpenVINO☆187Updated 2 weeks ago
- Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆86Updated 2 weeks ago
- ☆79Updated 4 months ago
- Orpheus Chat WebUI☆66Updated 3 months ago
- Pivotal Token Search☆107Updated last month
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆25Updated last month
- Live-bending a foundation model’s output at neural network level.☆261Updated 2 months ago
- ☆38Updated last week
- ☆104Updated last month
- Samples of good AI generated CUDA kernels☆83Updated last month
- Glyphs, acting as collaboratively defined symbols linking related concepts, add a layer of multidimensional semantic richness to user-AI …☆49Updated 4 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆608Updated this week
- A web application that converts speech to speech 100% private☆71Updated 3 weeks ago
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆70Updated 7 months ago
- Open source LLM UI, compatible with all local LLM providers.☆175Updated 9 months ago