CambioML / uniflow-llm-based-pdf-extraction-text-cleaning-data-clusteringLinks

LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster R&D!

☆217

Alternatives and similar repositories for uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering

Users that are interested in uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering are comparing it to the libraries listed below

Sorting:

pat-jj / s3
s3 - Efficient Yet Effective Search Agent Training via RL for RAG
☆470Updated this week
HKUST-KnowComp / AutoSchemaKG
This repository contains the implementation of AutoSchemaKG, a novel framework for automatic knowledge graph construction that combines s…
☆423Updated this week
CambioML / any-parser
Accurate, private and configurable document retrieval LLM
☆126Updated 3 weeks ago
SAILResearch / awesome-foundation-model-leaderboards
A curated list of awesome leaderboard-oriented resources for foundation models
☆278Updated 2 weeks ago
gersteinlab / ML-Bench
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://arxiv.org/abs/2311.098…
☆301Updated 2 weeks ago
zou-group / avatar
(NeurIPS 2024) AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning
☆216Updated last month
rllm-team / rllm
Pytorch Library for Relational Table Learning with LLMs.
☆432Updated 2 weeks ago
facebookresearch / DocAgent
DocAgent is a system designed to generate high-quality, context-aware code documentation for Python codebases using a multi-agent approac…
☆283Updated 2 months ago
GreenBitAI / green-bit-llm
A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.
☆185Updated last month
OpenDCAI / RARE
Official implementation of RARE: Retrieval-Augmented Reasoning Modeling
☆183Updated last month
pat-jj / TextbookKG
TxBKG - Knowledge Graph Generation for Any PDFs
☆184Updated 7 months ago
vortezwohl / Autono
A ReAct-Based Highly Robust Autonomous Agent Framework.
☆209Updated 2 months ago
kse-ElEvEn / MAKGED
MAKGED is the first multi-agent framework for collaborative error detection in knowledge graphs.
☆29Updated 5 months ago
snap-stanford / stark
(NeurIPS D&B 2024) STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
☆313Updated 6 months ago
Wilson-ZheLin / Streamline-Analyst
An AI agent powered by LLMs that streamlines the entire process of data analysis. 🚀
☆434Updated 11 months ago
HKUST-KnowComp / Awesome-LLM-Scientific-Discovery
From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery
☆202Updated 2 weeks ago
Alpha-Innovator / NovelSeek
When Agent Becomes the Scientist – Building Closed-Loop System from Hypothesis to Verification
☆364Updated this week
HKUDS / GraphAgent
"GraphAgent: Agentic Graph Language Assistant"
☆308Updated 5 months ago
babel-llm / babel-llm
Babel - Open Multilingual Large Language Models Serving Over 90% of Global Speakers
☆207Updated 4 months ago
GraphRAG-Bench / GraphRAG-Benchmark
GraphRAG-Bench, the official repo of comprehensive benchmark and dataset for evaluating GraphRAG models.
☆134Updated this week
HaoAreYuDong / Large-Language-Models-for-Tabular-Data
☆46Updated 8 months ago
tencent-ailab / Leopard
The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"
☆157Updated 6 months ago
TracyWang95 / legal-prompts-for-gpt
An opensource legal prompts
☆362Updated 2 years ago
project-ryoma / ryoma
Common AI agent framework solving your data problems
☆382Updated last month
anchen1011 / chatgpt-finetune-ui
Simple python WebUI for fine-tuning ChatGPT (gpt-3.5-turbo)
☆180Updated last year
OPPO-PersonalAI / TaskCraft
A library for generating difficulty-scalable, multi-tool, and verifiable agentic tasks with execution trajectories.
☆120Updated 2 weeks ago
CoIR-team / coir
(ACL 2025 Main) A Comprehensive Benchmark for Code Information Retrieval.
☆119Updated 3 weeks ago
EverM0re / EraRAG-Official
[arXiv'25] EraRAG: Efficient and Incremental Retrieval-Augmented Generation for Growing Corpora
☆131Updated last week
Emerging-AI / ENOVA
A deployment, monitoring and autoscaling service towards serverless LLM serving.
☆153Updated last week
codefuse-ai / CodeFuse-CGM
☆303Updated 2 weeks ago