wjbmattingly / keyword-spacyLinks
Keyword spaCy is a spaCy pipeline component for extracting keywords from text using cosine similarity.
☆13Updated last year
Alternatives and similar repositories for keyword-spacy
Users that are interested in keyword-spacy are comparing it to the libraries listed below
Sorting:
- A BERT-based application for reusable text classification at scale☆38Updated 2 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- Bagpipes spaCy is a collection of custom spaCy pipeline components designed to enhance text processing capabilities.☆21Updated last year
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆80Updated 2 years ago
- ☆28Updated last year
- ☆55Updated last year
- Layout Analysis Dataset with Segmonto (LADaS)☆21Updated 4 months ago
- 🗺️ Data Cleaning and Textual Data Visualization 🗺️☆190Updated 5 months ago
- ☆67Updated last year
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆12Updated last year
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- Repository hosting the common code for the entity-fishing clients☆10Updated 5 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆68Updated last year
- A spaCy wrapper for GliNER☆124Updated 9 months ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆168Updated 3 years ago
- Efficient few-shot learning with cross-encoders.☆59Updated last year
- Streamlit Named Entity Recognition (NER) annotation custom component☆39Updated 3 years ago
- PyLate efficient inference engine☆66Updated 2 months ago
- An easy way to chunk spaCy docs.☆22Updated last year
- Collection de romans français du dix-huitième siècle (1751-1800) / Collection of Eighteenth-Century French Novels (1751-1800)☆23Updated last year
- Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ …☆63Updated 2 weeks ago
- Logical structure analysis for visually structured documents☆92Updated 3 years ago
- A dataset for pretraining language models targeted for legal tasks.☆139Updated 3 years ago
- HDBSCAN Tuning for BERTopic Models☆49Updated 2 years ago
- Repository for deepdoctection tutorial notebooks☆46Updated 5 months ago
- GLiNER model in a FastAPI microservice.☆45Updated 11 months ago
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆110Updated last year
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆55Updated 2 years ago
- Synthetic Text Dataset Generation for LLM projects☆43Updated last week
- Dataset Viber is your chill repo for data collection, annotation and vibe checks.☆46Updated last year