DS4SD / SemTabNetLinks
Repository for ACL paper: "Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs"
☆14Updated last year
Alternatives and similar repositories for SemTabNet
Users that are interested in SemTabNet are comparing it to the libraries listed below
Sorting:
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆67Updated 10 months ago
- Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.☆56Updated 7 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆92Updated 4 months ago
- [TACL] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retri…☆31Updated 11 months ago
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆103Updated 8 months ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆27Updated 5 months ago
- Universal text classifier for generative models☆24Updated last year
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆73Updated last month
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated last year
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆160Updated last year
- One Line To Build Zero-Data Classifiers in Minutes☆58Updated 11 months ago
- Examples using the Deep Search functionalities☆85Updated 7 months ago
- Enhancing Translation with RAG-Powered Large Language Models☆82Updated 3 weeks ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆34Updated last year
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆110Updated 2 months ago
- ☆51Updated last year
- Python library to use Pleias-RAG models☆61Updated 4 months ago
- Build document-native LLM applications☆54Updated last year
- ☆49Updated 7 months ago
- ☆118Updated last year
- A pipeline parallel training script for LLMs.☆158Updated 4 months ago
- LLM-Training-API: Including Embeddings & ReRankers, mergekit, LaserRMT☆27Updated last year
- My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"☆73Updated this week
- ☆89Updated 7 months ago
- Code for the EMNLP'24 paper "Learning to Extract Structured Entities Using Language Models"☆43Updated 5 months ago
- entropix style sampling + GUI☆27Updated 10 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆88Updated 3 months ago
- Dataset Viber is your chill repo for data collection, annotation and vibe checks.☆46Updated last year
- Let's build better datasets, together!☆263Updated 8 months ago
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆235Updated last month