Pleias / OCRoscope
Small python package to measure OCR quality and other related metrics.
β20Updated 8 months ago
Related projects β
Alternatives and complementary repositories for OCRoscope
- β68Updated 8 months ago
- π€ HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)β17Updated 7 months ago
- Tool to apply Legal Matter Specification Standard (LMSS) to documentsβ11Updated 2 months ago
- An easy way to chunk spaCy docs.β15Updated 2 months ago
- Code for SaGe subword tokenizer (EACL 2023)β22Updated last month
- Using open source LLMs to build synthetic datasets for direct preference optimizationβ40Updated 8 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing β‘β59Updated last week
- QLoRA for Masked Language Modelingβ20Updated last year
- A BERT-based application for reusable text classification at scaleβ37Updated last year
- Using short models to classify long textsβ20Updated last year
- Library for fast text representation and classification.β28Updated 10 months ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.β57Updated 6 months ago
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.β52Updated 3 months ago
- β19Updated last year
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.β46Updated last month
- Dataset Viber is your chill repo for data collection, annotation and vibe checks.β42Updated 2 months ago
- Source code and data for Like a Good Nearest Neighborβ28Updated 9 months ago
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progrβ¦β23Updated 2 months ago
- Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval anβ¦β25Updated last month
- β48Updated 2 months ago
- β20Updated 9 months ago
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welcβ¦β19Updated 8 months ago
- Starbucks: Improved Training for 2D Matryoshka Embeddingsβ17Updated 3 weeks ago
- Python API for https://vespa.ai, the open big data serving engineβ101Updated this week
- Repository containing the SPIN experiments on the DIBT 10k ranked promptsβ22Updated 7 months ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)β68Updated 3 weeks ago
- Knowledge Graph Generator appβ31Updated 6 months ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.β72Updated last year
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.β49Updated this week
- πΈ Train floret vectorsβ18Updated last year