swiss-ai / mmore
Massive Multimodal Open RAG & Extraction A scalable multimodal pipeline for processing, indexing, and querying multimodal documents Ever needed to take 8000 PDFs, 2000 videos, and 500 spreadsheets and feed them to an LLM as a knowledge base? Well, MMORE is here to help you!
β36Updated last week
Alternatives and similar repositories for mmore:
Users that are interested in mmore are comparing it to the libraries listed below
- Evaluate your LLM's response with Prometheus and GPT4 π―β908Updated last month
- Recipes for shrinking, optimizing, customizing cutting edge vision models. πβ1,407Updated 3 weeks ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,414Updated this week
- List of papers on hallucination detection in LLMs.β839Updated last week
- β633Updated 4 months ago
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard aβ¦β1,136Updated 3 months ago
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.β1,721Updated this week
- Bringing BERT into modernity via both architecture changes and scalingβ1,322Updated 3 weeks ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β771Updated 2 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,384Updated 2 weeks ago
- A reading list on LLM based Synthetic Data Generation π₯β1,238Updated 2 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.β420Updated last year
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β2,629Updated this week
- π Automatically annotate papers using LLMsβ316Updated 4 months ago
- β1,116Updated 10 months ago
- Synthetic data curation for post-training and structured data extractionβ1,230Updated this week
- Code for explaining and evaluating late chunking (chunked pooling)β369Updated 3 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ1,112Updated this week
- A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.β766Updated last month
- Stanford NLP Python library for Representation Finetuning (ReFT)β1,455Updated 2 months ago
- Toolkit for attaching, training, saving and loading of new heads for transformer modelsβ273Updated last month
- An Open Source Toolkit For LLM Distillationβ574Updated 3 months ago
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various useβ¦β110Updated last week
- β512Updated 5 months ago
- Large Concept Models: Language modeling in a sentence representation spaceβ2,095Updated 2 months ago
- SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.β731Updated 2 weeks ago
- Automatically evaluate your LLMs in Google Colabβ614Updated 11 months ago
- Official implementation of QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Dataβ28Updated 2 weeks ago
- awesome synthetic (text) datasetsβ272Updated 5 months ago
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuningβ354Updated 7 months ago