antirez / LLM-FTC-sampling
First token cutoff sampling inference example
☆28Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for LLM-FTC-sampling
- Using modal.com to process FineWeb-edu data☆19Updated 2 months ago
- Benchmarks comparing PyTorch and MLX on Apple Silicon GPUs☆57Updated 4 months ago
- Public reports detailing responses to sets of prompts by Large Language Models.☆26Updated last year
- MLX-Embeddings is the best package for running Vision and Language Embedding models locally on your Mac using MLX.☆77Updated last month
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆21Updated 5 months ago
- llm plugin for Cerebras fast inference API☆18Updated 3 weeks ago
- Website for Applied-LLMs work☆20Updated last month
- QLLM: A powerful CLI for seamless interaction with multiple Large Language Models. Simplify AI workflows, streamline development, and unl…☆24Updated last week
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆19Updated 7 months ago
- Because it's there.☆14Updated 2 months ago
- A super simple web interface to perform blind tests on LLM outputs.☆27Updated 8 months ago
- 🛠 Self-hosted, fast, and consistent remote configuration for apps.☆12Updated 2 years ago
- ☆21Updated 3 weeks ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆29Updated 6 months ago
- Python client for accessing the turbopuffer API.☆27Updated this week
- A Learning Journey: Micrograd in Mojo 🔥☆57Updated last month
- DocGenius AI - Generative AI Chatbot for your Documents - powered by Cloudera☆11Updated last month
- ☆30Updated last year
- Trace LLM calls (and others) and visualize them in WandB, as interactive SVG or using a streaming local webapp☆14Updated 10 months ago
- ☆12Updated 6 months ago
- Run LLMs on Replicate with vLLM☆15Updated last month
- Chat Markup Language conversation library☆54Updated 10 months ago
- LLM plugin for clustering embeddings☆62Updated 8 months ago
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization, with PyTorch/CUDA☆35Updated 8 months ago
- ☆15Updated 11 months ago
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆17Updated 11 months ago
- A clone of OpenAI's Tokenizer page for HuggingFace Models☆44Updated last year
- LLama implementations benchmarking framework☆12Updated last year