Pleias / marginalia
☆67Updated 6 months ago
Related projects: ⓘ
- A BERT-based application for reusable text classification at scale☆37Updated last year
- Small python package to measure OCR quality and other related metrics.☆19Updated 7 months ago
- Libraries, Archives and Museums (LAM)☆81Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimization☆33Updated 6 months ago
- 🗺️ Data Cleaning and Textual Data Visualization 🗺️☆131Updated 3 months ago
- ☆20Updated 7 months ago
- End-to-end zero-shot entity and relation extraction☆50Updated last month
- Dataset Viber is your chill repo for data collection, annotation and vibe checks.☆39Updated 2 weeks ago
- Layout Analysis Dataset with Segmonto (LADaS)☆17Updated 2 months ago
- 🤗 HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)☆17Updated 6 months ago
- NLP with Rust for Python 🦀🐍☆57Updated 3 months ago
- Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval an…☆16Updated this week
- A spaCy wrapper for GliNER☆77Updated 2 months ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆57Updated 4 months ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆71Updated last year
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆61Updated 6 months ago
- An easy way to chunk spaCy docs.☆11Updated last month
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆67Updated 2 months ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆66Updated last year
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 2 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆21Updated this week
- A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.☆51Updated last month
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆99Updated 8 months ago
- ☆53Updated 8 months ago
- QLoRA for Masked Language Modeling☆20Updated last year
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆22Updated 6 months ago
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆11Updated last month
- ☆11Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆58Updated 2 weeks ago
- Knowledge Graph Generator app☆30Updated 5 months ago