Pleias / pleias_ScholasticAI
β45Updated 2 weeks ago
Alternatives and similar repositories for pleias_ScholasticAI:
Users that are interested in pleias_ScholasticAI are comparing it to the libraries listed below
- Small python package to measure OCR quality and other related metrics.β21Updated last year
- β67Updated 11 months ago
- π Dehyphenation of broken text (mainly German), i.e., extracted from a PDFβ38Updated 2 years ago
- A BERT-based application for reusable text classification at scaleβ37Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing β‘β64Updated 3 months ago
- This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and applβ¦β14Updated 2 weeks ago
- An easy way to chunk spaCy docs.β19Updated 6 months ago
- Layout Analysis Dataset with Segmonto (LADaS)β19Updated 2 weeks ago
- Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval anβ¦β27Updated 5 months ago
- Using embeddings compressed by Product Quantization, in Javascriptβ31Updated last year
- Conduct in-depth research with AI-driven insights : DeepDive is a command-line tool that leverages web searches and AI models to generateβ¦β36Updated 5 months ago
- Tools for interactive visual exploration of semantic embeddings.β30Updated 5 months ago
- Simple customizable evaluation for text retrieval performance of Sentence Transformers embedders on PDFsβ19Updated last month
- Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.β40Updated 3 weeks ago
- Pre-train Static Word Embeddingsβ47Updated 3 weeks ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.β40Updated 4 months ago
- Open source text annotation software created by the french supreme court 'Cour de cassation'β20Updated this week
- Jupyter Notebooks and an R Notebook for encoding PokΓ©mon embeddings and creating data visualizations.β19Updated 7 months ago
- Libraries, Archives and Museums (LAM)β82Updated 2 years ago
- Very minimal (and stateless) agent frameworkβ41Updated last month
- GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extractionβ66Updated 6 months ago
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorchβ82Updated last month
- A library for working with prompt templates locally or on the Hugging Face Hub.β40Updated last week
- β35Updated 2 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.β47Updated 8 months ago
- PDF parser powered by grobidβ25Updated 6 months ago
- BPE modification that implements removing of the intermediate tokens during tokenizer training.β25Updated 2 months ago