An example of how to use spaCy for extremely large files without running into memory issues
☆36Sep 17, 2022Updated 3 years ago
Alternatives and similar repositories for spacy-extreme
Users that are interested in spacy-extreme are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Use spaCy for NLP and output to the FoLiA XML format.☆12Feb 27, 2024Updated 2 years ago
- ☆10Jun 8, 2024Updated last year
- Download and load spaCy models on-the-fly☆15Feb 9, 2023Updated 3 years ago
- Make the Best of Cross-lingual Transfer: Evidence from POS Tagging with over 100 Languages (ACL 2022)☆19May 17, 2022Updated 3 years ago
- The official repository for Toxic Commons and Celadon. Toxicity Classification for public domain data.☆21Nov 10, 2024Updated last year
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- numeric fused-head identification and resolution☆33Oct 16, 2019Updated 6 years ago
- ☆12May 31, 2024Updated last year
- Story understanding and plot analysis pilot.☆11Dec 27, 2022Updated 3 years ago
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Apr 30, 2023Updated 2 years ago
- Transformer based Trigram Blocking implementation in Tensorflow☆11Feb 26, 2020Updated 6 years ago
- Massively Multilingual Transfer for NER☆86Oct 7, 2021Updated 4 years ago
- This crawler grab the text content of the posting in threads. For Baidu::Tieba☆12Apr 5, 2012Updated 14 years ago
- Specification of a stand-off element for the TEI guidelines☆12Apr 29, 2021Updated 4 years ago
- Text pre-processing library for deep learning (Keras, tensorflow).☆115Aug 6, 2018Updated 7 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation☆63Oct 29, 2018Updated 7 years ago
- API for WOLF, a free French WordNet☆14May 4, 2018Updated 7 years ago
- benchmarks for evaluating MT models☆11Jun 26, 2024Updated last year
- Set-Equivariant Deep Learning Models☆22Dec 23, 2021Updated 4 years ago
- Generic framework for information extraction tasks, including recognition of named entities, temporal expressions, spatial expressions an…☆13Jun 5, 2023Updated 2 years ago
- Data and code for the experiments in the Outlier Detection task proposed by Camacho-Collados et al.☆13Aug 28, 2018Updated 7 years ago
- Inforex is a web system for text corpora construction.☆12Updated this week
- spaCy pipeline component for adding text readability meta data to Doc objects.☆56Apr 1, 2019Updated 7 years ago
- ☆15May 26, 2021Updated 4 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.☆28Dec 15, 2019Updated 6 years ago
- Wrap-up around RinteRface templates☆11Apr 10, 2019Updated 7 years ago
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆22Feb 14, 2024Updated 2 years ago
- A python library to generate highly realistic typos (fuzz-testing)☆13Mar 16, 2025Updated last year
- Neural Fuzzy Repair (NFR) is a data augmentation pipeline, which integrates fuzzy matches (i.e. similar translations) into neural machine…☆12Aug 14, 2024Updated last year
- ☆20Mar 30, 2022Updated 4 years ago
- A collection of over 1.5 Million tweets data translated to French, with their sentiment.☆35May 18, 2017Updated 8 years ago
- eXternally configurable REference and Non Named Entity Recognizer☆17Jun 17, 2024Updated last year
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code for Fast Information-theoretic Bayesian Optimisation☆16Jun 7, 2018Updated 7 years ago
- SpacyV3 Text Categorizer Tutorial☆17Nov 15, 2020Updated 5 years ago
- ☆45Sep 26, 2021Updated 4 years ago
- Examples of Text Mining in WEKA☆63Jun 22, 2013Updated 12 years ago
- Automatically exported from code.google.com/p/oxygen-tei☆17Feb 18, 2026Updated 2 months ago
- COMBO is jointly trained tagger, lemmatizer and dependency parser.☆36Mar 24, 2023Updated 3 years ago
- python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. Wi…☆18May 2, 2025Updated 11 months ago