chonkie-ai / autotiktokenizer

🧰 The AutoTokenizer that TikToken always needed -- Load any tokenizer with TikToken now! ✨

☆39

Alternatives and similar repositories for autotiktokenizer:

Users that are interested in autotiktokenizer are comparing it to the libraries listed below

louisbrulenaudet / ragoon
High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡
☆64Updated 4 months ago
michaelfeil / embed
A stable, fast and easy-to-use inference library with a focus on a sync-to-async API
☆45Updated 5 months ago
ali-bahrainian / RAG_best_practices
☆83Updated last month
s-smits / modernbert-finetune
Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training
☆61Updated 3 weeks ago
MinishLab / tokenlearn
Pre-train Static Word Embeddings
☆47Updated last month
davanstrien / haiku-dpo
Using open source LLMs to build synthetic datasets for direct preference optimization
☆58Updated last year
Knowledgator / GLiClass
Generalist and Lightweight Model for Text Classification
☆87Updated this week
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆49Updated 7 months ago
Pleias / Various-Finetuning
Set of scripts to finetune LLMs
☆36Updated 11 months ago
premAI-io / benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆136Updated 7 months ago
IlyasMoutawwakil / py-txi
A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.
☆34Updated 2 months ago
mixedbread-ai / batched
The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…
☆123Updated 2 months ago
S1M0N38 / dspy-arxiv
Explore the use of DSPy for extracting features from PDFs 🔎
☆38Updated last year
PrithivirajDamodaran / SPLADERunner
Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…
☆29Updated 6 months ago
cognitivecomputations / spectrum
☆113Updated 5 months ago
LAGoM-NLP / transtokenizer
☆41Updated last month
Locutusque / TPU-Alignment
Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free
☆230Updated 4 months ago
huggingface / huggingface-inference-toolkit
Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.
☆63Updated last month
NielsRogge / awesome-huggingface
Repository containing awesome resources regarding Hugging Face tooling.
☆46Updated last year
lancedb / ragged
☆18Updated 4 months ago
MoritzLaurer / synthetic-data-blog
This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data
☆67Updated last year
PrithivirajDamodaran / Route0x
Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da
☆94Updated 2 months ago
MoritzLaurer / prompt_templates
A library for working with prompt templates locally or on the Hugging Face Hub.
☆42Updated 2 weeks ago
flowaicom / flow-judge
Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…
☆62Updated 4 months ago
cognitivecomputations / kraken
☆65Updated 9 months ago
rayliuca / T-Ragx
Enhancing Translation with RAG-Powered Large Language Models
☆76Updated last month
davidberenstein1957 / dataset-viber
Dataset Viber is your chill repo for data collection, annotation and vibe checks.
☆46Updated 5 months ago
huggingface / data-is-better-together
Let's build better datasets, together!
☆256Updated 2 months ago