antirez / LLM-FTC-sampling
First token cutoff sampling inference example
☆29Updated last year
Alternatives and similar repositories for LLM-FTC-sampling:
Users that are interested in LLM-FTC-sampling are comparing it to the libraries listed below
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆18Updated 9 months ago
- LLama implementations benchmarking framework☆12Updated last year
- Python client for accessing the turbopuffer API.☆31Updated this week
- The official evaluation suite and dynamic data release for MixEval.☆10Updated 3 months ago
- Benchmarks comparing PyTorch and MLX on Apple Silicon GPUs☆68Updated 6 months ago
- Low-level Guidance Parser☆75Updated this week
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆23Updated 2 months ago
- Using modal.com to process FineWeb-edu data☆19Updated last month
- Trace LLM calls (and others) and visualize them in WandB, as interactive SVG or using a streaming local webapp☆14Updated last year
- Run Llama 2 using MLX on macOS☆32Updated last year
- ☆12Updated this week
- QLLM: A powerful CLI for seamless interaction with multiple Large Language Models. Simplify AI workflows, streamline development, and unl…☆31Updated last week
- Real-time visualisation☆16Updated 6 months ago
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization, with PyTorch/CUDA☆35Updated 10 months ago
- Because it's there.☆14Updated 3 months ago
- ☆23Updated 2 months ago
- Access fireworks.ai models via API☆11Updated 9 months ago
- Efficient BM25 with DuckDB 🦆☆36Updated 3 weeks ago
- TRITONCACHE implementation of a Redis cache☆13Updated this week
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆29Updated 3 months ago
- Alpha-Zero Connect Four NN trained via self play☆13Updated 3 months ago
- Website for Applied-LLMs work☆20Updated 2 weeks ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆96Updated this week
- Showcase how mxbai-embed-large-v1 can be used to produce binary embedding. Binary embeddings enabled 32x storage savings and 40x faster r…☆15Updated 9 months ago
- Proof of concept for a generative AI application framework powered by WebAssembly and Extism☆14Updated last year
- The Prime Intellect CLI provides a powerful command-line interface for managing GPU resources across various providers☆13Updated this week
- Training code for Sparse Autoencoders on Embedding models☆35Updated last month
- ☆18Updated 3 months ago