MaxHalford / orc
π§ Parsing structured information from OCR outputs
β18Updated 11 months ago
Related projects β
Alternatives and complementary repositories for orc
- βοΈ Parallel and distributed training with spaCy and Rayβ54Updated last year
- A Python library aimed at dissecting and augmenting NER training data.β56Updated last year
- Python package for deduplication/entity resolution using active learningβ78Updated 2 months ago
- An End-to-End Evaluation Framework for Entity Resolution Systemsβ26Updated 11 months ago
- An easy way to chunk spaCy docs.β16Updated 3 months ago
- π€ HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)β17Updated 8 months ago
- spaCy match and replace, maintaining conjugationβ34Updated last year
- Generate reports for spaCy models.β28Updated 2 years ago
- π Process PDFs, Word documents and more with spaCyβ75Updated this week
- Source code and data for Like a Good Nearest Neighborβ28Updated 9 months ago
- Python API for https://vespa.ai, the open big data serving engineβ105Updated this week
- β67Updated 2 years ago
- π Logging utilities for spaCyβ12Updated last year
- β42Updated last year
- A spaCy wrapper for GliNERβ91Updated 4 months ago
- Template-based generation of DAG cards from Metaflow classes, inspired by Google cards for machine learning models.β30Updated 2 years ago
- 𧬠A VS Code extension for annotating data with Prodigyβ30Updated 2 years ago
- Train huggingface models on top of Prodigy annotationsβ21Updated 9 months ago
- Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and teβ¦β42Updated 10 months ago
- KEN: Relational Data Embeddingsβ27Updated 10 months ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.β57Updated 6 months ago
- A proposed standard `NOCK` for a Parquet format that supports efficient distributed serialization of multiple kinds of graph technologiesβ17Updated 2 years ago
- Bag of, not words, but tricks!β68Updated last year
- Pyinfer is a model agnostic tool for ML developers and researchers to benchmark the inference statistics for machine learning models or fβ¦β24Updated 3 years ago
- A Streamlit component for annotating text by text selecting.β40Updated 5 months ago
- π₯ Use Hugging Face text and token classification pipelines directly in spaCyβ62Updated 8 months ago
- β13Updated last year
- Generalist and Lightweight Model for Text Classificationβ51Updated last week
- RaKUn 2.0 - A fast keyword detection algorithmβ65Updated 3 months ago
- Python SDK for Galileo's NLP and CV Studio.β18Updated this week