conanhujinming / text_dedupLinks
High-Performance Text Deduplication Toolkit
☆40Updated last week
Alternatives and similar repositories for text_dedup
Users that are interested in text_dedup are comparing it to the libraries listed below
Sorting:
- Lightweight Llama 3 8B Inference Engine in CUDA C☆48Updated 5 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆73Updated last week
- Lightweight C inference for Qwen3 GGUF with the smallest (0.6B) at the fullest (FP32)☆16Updated 2 weeks ago
- Editor with LLM generation tree exploration☆73Updated 6 months ago
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆53Updated last year
- Enhancing LLMs with LoRA☆100Updated 3 weeks ago
- InferX is a Inference Function as a Service Platform☆129Updated last week
- ☆42Updated 2 weeks ago
- Running Microsoft's BitNet via Electron, React & Astro☆43Updated 2 months ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆170Updated 3 weeks ago
- Load and run Llama from safetensors files in C☆11Updated 10 months ago
- Sparse Inferencing for transformer based LLMs☆197Updated 2 weeks ago
- A Field-Theoretic Approach to Unbounded Memory in Large Language Models☆20Updated 4 months ago
- Guaranteed Structured Output from any Language Model via Hierarchical State Machines☆145Updated 2 months ago
- A chat UI for Llama.cpp☆15Updated last week
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆102Updated last month
- ☆209Updated last month
- ☆13Updated 4 months ago
- Chat WebUI is an easy-to-use user interface for interacting with AI, and it comes with multiple useful built-in tools such as web search …☆42Updated last week
- Simple node proxy for llama-server that enables MCP use☆13Updated 3 months ago
- Input your VRAM and RAM and the toolchain will produce a GGUF model tuned to your system within seconds — flexible model sizing and lowes…☆33Updated this week
- ☆29Updated 4 months ago
- ggml implementation of embedding models including SentenceTransformer and BGE☆59Updated last year
- A real-time shared memory layer for multi-agent LLM systems.☆47Updated 2 months ago
- ☆64Updated 8 months ago
- 1.58-bit LLaMa model☆82Updated last year
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆52Updated 6 months ago
- AirLLM 70B inference with single 4GB GPU☆14Updated 2 months ago
- ☆98Updated 2 months ago
- VLLM Port of the Chatterbox TTS model☆283Updated this week