KorAP / Tokenizer-EvaluationLinks
Benchmark scripts for comparing different tokenizers and sentence segmenters of German
β11Updated 2 years ago
Alternatives and similar repositories for Tokenizer-Evaluation
Users that are interested in Tokenizer-Evaluation are comparing it to the libraries listed below
Sorting:
- Python library to use Pleias-RAG modelsβ51Updated last month
- NLP with Rust for Python π¦πβ62Updated 2 weeks ago
- Pre-train Static Word Embeddingsβ70Updated this week
- Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ β¦β42Updated last month
- Modular Rust transformer/LLM library using Candleβ35Updated last year
- Library for fast text representation and classification.β28Updated last year
- Semantically Search Emojis From the Command Line!β13Updated last year
- This is a new backend implementation of the ANNIS linguistic search and visualization system.β17Updated last week
- Next-generation Punkt sentence boundary detection with zero dependenciesβ17Updated last month
- β67Updated last year
- image-to-text model for PDF.jsβ36Updated 2 months ago
- spaCy entry points for Curated Transformersβ31Updated this week
- Using embeddings compressed by Product Quantization, in Javascriptβ31Updated last year
- Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, includingβ¦β54Updated last month
- Libraries, Archives and Museums (LAM)β84Updated 2 years ago
- π’ Work with static vector modelsβ28Updated last month
- A repository of instructions in French to fine-tune LLMsβ17Updated last year
- Python Finite-State Toolkitβ54Updated 2 weeks ago
- π€ HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)β17Updated last year
- π« SpaCy wrapper for ConceptNet π«β93Updated last year
- Generate a SQLite database from Wikipedia & Wikidata dumps.β35Updated last year
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.β79Updated last year
- Tree-based indexes for neural-searchβ32Updated last year
- A Python module for retrieving script types of writing systems including alphabets, abjads, abugidas, syllabaries, logographs, featurals β¦β13Updated 10 months ago
- GGML implementation of BERT model with Python bindings and quantization.β54Updated last year
- Fast Text Classification with Compressors dictionaryβ149Updated last year
- β32Updated 2 years ago
- β18Updated 3 weeks ago
- π₯ Use Hugging Face text and token classification pipelines directly in spaCyβ63Updated last year
- Extract knowledge from raw textβ13Updated 3 years ago