DS4SD / SemTabNet
Repository for ACL paper: "Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs"
☆13Updated 9 months ago
Alternatives and similar repositories for SemTabNet:
Users that are interested in SemTabNet are comparing it to the libraries listed below
- Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.☆48Updated 2 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆67Updated 5 months ago
- Examples using the Deep Search functionalities☆69Updated 2 months ago
- ☆94Updated 2 weeks ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆20Updated 2 weeks ago
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆18Updated 5 months ago
- Build document-native LLM applications☆52Updated 6 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆64Updated 6 months ago
- Simple package to extract text with coordinates from programmatic PDFs☆93Updated last week
- OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆71Updated last week
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆87Updated 3 months ago
- ☆35Updated last week
- Transform Unstructured Data into Synthetic Datasets☆26Updated 7 months ago
- A python library to define and validate data types in Docling.☆96Updated this week
- A general human-ai interaction platform.☆13Updated 2 months ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆46Updated 5 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆21Updated 4 months ago
- python package to parse pdfs with different parsers☆35Updated 3 months ago
- A new novel multi-modality (Vision) RAG architecture☆24Updated 6 months ago
- Docling LangChain integration☆20Updated 2 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆94Updated 8 months ago
- LLM as Interpreter for Natural Language Programming, Pseudo-code Programming and Flow Programming of AI Agents☆38Updated 8 months ago
- Source code of the paper: RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering [F…☆62Updated 10 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆59Updated last year
- ☆27Updated 7 months ago
- Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval an…☆28Updated 6 months ago
- Universal text classifier for generative models☆22Updated 8 months ago
- 🔎 A deep-dive into HyDE for Advanced LLM RAG + 💡 Introducing AutoHyDE, a semi-supervised framework to improve the effectiveness, covera…☆32Updated last year
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO