DataScienceUIBK / ArabicaQALinks
ArabicaQA: Comprehensive Dataset for Arabic Question Answering accepted at SIGIR 2024
☆17Updated last year
Alternatives and similar repositories for ArabicaQA
Users that are interested in ArabicaQA are comparing it to the libraries listed below
Sorting:
- This is the official repository for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks.☆26Updated 10 months ago
- ☆125Updated last year
- Generalist and Lightweight Model for Text Classification☆163Updated 4 months ago
- Aranizer: A Custom Tokenizer based on SentencePiece and BPE tailored for Arabic Language Modeling☆20Updated last year
- Docutron Toolkit: detection and segmentation analysis for legal data extraction over documents.☆26Updated last year
- أسئلة باللغة العربية تركز على الثقافة السعودية تم اختبارها على عدد من النماذج اللغوية الضخمة LLMs☆17Updated 8 months ago
- ☆124Updated 7 months ago
- Simple UI for debugging correlations of text embeddings☆295Updated 4 months ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆65Updated last year
- Chunk your text using gpt4o-mini more accurately☆44Updated last year
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆179Updated last year
- Data extraction with LLM on CPU☆112Updated last year
- Simple package to extract text with coordinates from programmatic PDFs☆202Updated 3 weeks ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆66Updated last year
- Solving data for LLMs - Create quality synthetic datasets!☆151Updated 8 months ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆129Updated last year
- This repo is the central repo for all the RAG Evaluation reference material and partner workshop☆76Updated 5 months ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆190Updated last year
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆336Updated 4 months ago
- Workflows are an event-driven, async-first, step-based way to control the execution flow of AI applications like agents.☆224Updated this week
- Setu is a comprehensive pipeline designed to clean, filter, and deduplicate diverse data sources including Web, PDF, and Speech data. Bui…☆15Updated last year
- Testing and evaluation framework for voice agents☆151Updated 4 months ago
- A reimplementation of langgraph's customer support example in Rasa's CALM paradigm and a quantiative evaluation of the 2 approaches☆81Updated 6 months ago
- Instruction dataset for Arabic with 10,000 instruction and output pairs. CIDAR can be used to fine-tune LLMs to follow instructions.☆41Updated 6 months ago
- Data extraction with LLM on CPU☆68Updated last year
- Build a Streamlit Chatbot using Langchain, ColBERT, Ragatouille, and ChromaDB☆122Updated last year
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆78Updated 11 months ago
- ☆82Updated 2 weeks ago
- This PyTorch implementation of LayoutLM paper by Microsoft demonstrate the SequenceClassfication task using HuggingFaceTransformers to cl…☆34Updated 3 years ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆377Updated 2 months ago