huggingface/finepdfs

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/huggingface/finepdfs)

huggingface / finepdfs

Codebase for FinePDFs

☆187

Alternatives and similar repositories for finepdfs

Users that are interested in finepdfs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kwami-labs / kwami
View on GitHub
👻 kwami.io | A 3D Interactive AI Companion Library for creating engaging AI companions with visual (blob), audio, and AI speech capabili…
☆45Jun 19, 2026Updated last month
felix-schmitt / MathNet
View on GitHub
MathNet: A Data-Centric Approach, Dataset and Benchmark Model to Advance Mathematical Expression Recognition
☆10Mar 19, 2025Updated last year
notch-ai / autosteer
View on GitHub
Desktop app for multi-workspace Claude Code management
☆67Nov 26, 2025Updated 8 months ago
rastaweb / domoscope
View on GitHub
☆28Nov 10, 2025Updated 8 months ago
Marker-Inc-Korea / CoT-llama2
View on GitHub
Chain-of-thought 방식을 활용하여 llama2를 fine-tuning
☆10Nov 18, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
cisnlp / GlotWeb
View on GitHub
[WWW 2026] 🕸 GlotWeb: Web Indexing for Minority Languages
☆17Apr 14, 2026Updated 3 months ago
taidopurason / tokenizer-extension
View on GitHub
☆15Dec 4, 2025Updated 7 months ago
allenai / bolmo-core
View on GitHub
Code for Bolmo: Byteifying the Next Generation of Language Models
☆136Jul 6, 2026Updated 3 weeks ago
docling-project / docling-eval
View on GitHub
Evaluation framework for document processing models and services.
☆77Jul 16, 2026Updated last week
jbarrow / commonforms
View on GitHub
CommonForms — open models to auto-detect PDF form fields
☆1,230Jun 17, 2026Updated last month
swiss-ai / parity-aware-bpe
View on GitHub
Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization [ACL 2026]
☆20Apr 18, 2026Updated 3 months ago
philschmid / multilingual-serverless-qa-aws-lambda
View on GitHub
☆10Dec 17, 2020Updated 5 years ago
plainionist / brain-overflow
View on GitHub
Simple snippet database
☆13Nov 19, 2024Updated last year
stefanpejcic / EmailFilter
View on GitHub
Self-hosted, privacy-focused email validation 📨🔐
☆54Updated this week
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
bipul1010 / agents_tutorial
View on GitHub
☆19Aug 7, 2024Updated last year
Pleias / Pleias-Rag
View on GitHub
☆17Feb 25, 2025Updated last year
stefan-it / modern-bert-ner
View on GitHub
My NER Experiments with ModernBERT and Ettin
☆29Jul 17, 2025Updated last year
SkyworkAI / Skywork-DeepResearch
View on GitHub
☆27Aug 13, 2025Updated 11 months ago
huggingface / fineweb-2
View on GitHub
☆256Oct 27, 2025Updated 9 months ago
RUCAIBox / MPOP
View on GitHub
☆13Jun 16, 2021Updated 5 years ago
LuisaMaerz / KnowMAN
View on GitHub
KnowMAN: Weakly Supervised Multinomial Adversarial Networks
☆12Nov 9, 2021Updated 4 years ago
ilinguistics / common_crawl_corpus
View on GitHub
Scripts for building a geo-located web corpus using Common Crawl data
☆11Jan 18, 2026Updated 6 months ago
juletx / self-translate
View on GitHub
Do Multilingual Language Models Think Better in English?
☆42Aug 3, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
cisnlp / MEXA
View on GitHub
[ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
☆11Apr 6, 2025Updated last year
CodeCreator / WebOrganizer
View on GitHub
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
☆83May 2, 2025Updated last year
amazon-science / factual-confidence-of-llms
View on GitHub
Code for paper "Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators"
☆17Dec 4, 2024Updated last year
luyug / MORES
View on GitHub
☆10Apr 16, 2021Updated 5 years ago
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,239Jul 22, 2026Updated last week
kmad / dabench-rlm-eval
View on GitHub
Benchmark harness for evaluating DSPy RLMs on data analysis tasks (InfiAgent-DABench)
☆23Mar 22, 2026Updated 4 months ago
NVIDIA-NeMo / Curator
View on GitHub
Scalable data pre processing and curation toolkit for LLMs
☆1,687Updated this week
thu-coai / Glyph
View on GitHub
Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"
☆595Nov 4, 2025Updated 8 months ago
weaviate / retrieve-dspy
View on GitHub
A collection of Compound Retrieval Systems implemented with DSPy and Weaviate.
☆99Jun 1, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
google-deepmind / gemma_penzai
View on GitHub
A JAX Research Toolkit for Visualizing, Manipulating, and Understanding Gemma Models with Multi-modal Support based on Penzai.
☆95Jan 13, 2026Updated 6 months ago
Katakate / k7
View on GitHub
Your own self-hosted infra for lightweight VM sandboxes to safely execute untrusted code. CLI, API, Python SDK. ⭐ Star it if you like it!…
☆783Dec 14, 2025Updated 7 months ago
Princeton-AI2-Lab / DeepOCR
View on GitHub
A reproduction of the Deepseek-OCR model including training
☆208Nov 21, 2025Updated 8 months ago
FoundationAgents / ReCode
View on GitHub
Next paradigm for LLM Agent. Unify plan and action through recursive code generation for adaptive, human-like decision-making.
☆559Apr 21, 2026Updated 3 months ago
jxnl / instructor-classify
View on GitHub
☆37May 5, 2025Updated last year
llm-jp / llm-jp-eval-mm
View on GitHub
A lightweight framework for evaluating visual-language models.
☆43Apr 20, 2026Updated 3 months ago
TIGER-AI-Lab / PixelWorld
View on GitHub
The official code of "PixelWorld: Towards Perceiving Everything as Pixels" [TMLR25]
☆15Sep 12, 2025Updated 10 months ago