ucbepic / TWIX
TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents
☆173Updated this week
Alternatives and similar repositories for TWIX:
Users that are interested in TWIX are comparing it to the libraries listed below
- Deep Research for your internal data☆313Updated last week
- ☆85Updated 3 months ago
- A user interface for DSPy☆144Updated 6 months ago
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆279Updated 2 weeks ago
- Together Open Deep Research☆260Updated 3 weeks ago
- ☆100Updated last month
- Claude Deep Research config for Claude Code.☆170Updated last month
- Simple AI coder that can do most of my work for me, including working on himself.☆235Updated last month
- Helping you select an AI agent framework☆227Updated this week
- llm-consortium orchestrates mulitple LLMs, iteratively refines & achieves consensus.☆248Updated last week
- Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embedd…☆104Updated 3 weeks ago
- OCR Benchmark☆470Updated 3 weeks ago
- 📰 Building News Agents to Summarize News with MCP, Q, and tmux☆64Updated this week
- Structured information extraction from documents☆315Updated 7 months ago
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆123Updated 2 weeks ago
- LLMap solves context extraction for large codebases☆88Updated 2 months ago
- An experiment in meeting transcription and diarization with just an LLM. Maybe I went a little overboard though☆544Updated last month
- Prompt design in Python☆57Updated 5 months ago
- ☆36Updated 3 months ago
- Letting Claude Code develop his own MCP tools :)☆98Updated 2 months ago
- ☆93Updated 5 months ago
- A flexible, adaptive classification system for dynamic text classification☆162Updated this week
- 🤖 Headless IDE for AI agents☆186Updated 2 weeks ago
- ☆87Updated 2 months ago
- ☆121Updated 2 months ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆127Updated last week
- ContextGem: Effortless LLM extraction from documents☆115Updated this week
- Solving data for LLMs - Create quality synthetic datasets!☆146Updated 3 months ago
- ☆103Updated 4 months ago
- A list of useful Open Source tools and scrapers to gather data for LLMs☆230Updated 2 months ago