erdogant / undoubleLinks
Python package undouble is to detect (near-)identical images.
☆54Updated last month
Alternatives and similar repositories for undouble
Users that are interested in undouble are comparing it to the libraries listed below
Sorting:
- Input text or image, get back matching image fashion results, using Jina, DocArray, and CLIP☆50Updated 3 years ago
- Traversing links to find the deep source of information☆69Updated 2 years ago
- OCR, Archive, Index and Search: Implementation agnostic OCR framework.☆223Updated last year
- A library for detecting problematic data segments in structured and unstructured data with few lines of code.☆64Updated last year
- Python package to generate image embeddings with CLIP without PyTorch/TensorFlow☆154Updated 3 years ago
- 🖍️ Highlight text in documents☆109Updated 5 months ago
- Python package for deduplication/entity resolution using active learning☆81Updated last year
- MultiOCR, an interface that connects multiple open-source OCR and various Cloud OCR.☆31Updated 2 years ago
- 🤝 Trade any tensors over the network☆30Updated 2 years ago
- 🔤 Measure edit distance based on keyboard layout☆61Updated last week
- 🚂 Fine-tune OpenAI models for text classification, question answering, and more☆16Updated 2 years ago
- The largest multilingual image-text classification dataset. It contains fashion products.☆75Updated 2 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- A zero-shot captcha solver.☆16Updated last year
- 🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.☆307Updated last year
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated 2 years ago
- Concept Modeling: Topic Modeling on Images and Text☆214Updated 11 months ago
- RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).☆35Updated 3 years ago
- Accelerated NLP pipelines for fast inference on CPU and GPU. Built with Transformers, Optimum and ONNX Runtime.☆126Updated 3 years ago
- Fast Near-Duplicate Image Search and Delete using pHash, t-SNE and KDTree.☆163Updated 2 years ago
- semantically distinct key phrase extraction using hilbert hashes.☆50Updated 3 years ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆80Updated 2 years ago
- A Streamlit component for annotating text by text selecting.☆40Updated last year
- Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML☆63Updated 8 months ago
- A neural network based file sorter. Trains an autoencoder to sort images or audio based on the similarity of their encodings, or uses the…☆28Updated 2 years ago
- Repository containing our datasets for HTR (handwritten text recognition) task.☆25Updated 3 years ago
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆148Updated 9 months ago
- ☆14Updated 3 years ago
- Gazeta: Dataset for automatic summarization of Russian news / Газета: набор данных для автоматического реферирования на русском языке☆35Updated 4 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆176Updated 4 months ago