NimbleEdge / sparse_transformersLinks
Sparse Inferencing for transformer based LLMs
☆201Updated 3 months ago
Alternatives and similar repositories for sparse_transformers
Users that are interested in sparse_transformers are comparing it to the libraries listed below
Sorting:
- LLM Inference on consumer devices☆125Updated 7 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆44Updated 2 weeks ago
- ☆62Updated 4 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆98Updated 5 months ago
- InferX: Inference as a Service Platform☆138Updated this week
- ☆173Updated 3 months ago
- AI management tool☆121Updated last year
- ☆158Updated 6 months ago
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆559Updated 2 months ago
- Enhancing LLMs with LoRA☆174Updated 3 weeks ago
- ☆62Updated 4 months ago
- ☆135Updated 6 months ago
- ☆32Updated 7 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆62Updated 9 months ago
- automatically quant GGUF models☆214Updated 3 weeks ago
- Guaranteed Structured Output from any Language Model via Hierarchical State Machines☆145Updated last month
- Easy to use, High Performant Knowledge Distillation for LLMs☆95Updated 6 months ago
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆28Updated 3 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆83Updated 2 weeks ago
- ☆106Updated 2 months ago
- 1.58-bit LLaMa model☆83Updated last year
- A pipeline parallel training script for LLMs.☆162Updated 6 months ago
- B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.☆26Updated last year
- ☆106Updated 4 months ago
- From-scratch implementation of OpenAI's GPT-OSS model in Python. No Torch, No GPUs.☆102Updated last week
- ☆315Updated this week
- Pivotal Token Search☆131Updated 4 months ago
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆95Updated last week
- ☆228Updated last month
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆23Updated last year