DataScienceUIBK / ArabicaQA
ArabicaQA: Comprehensive Dataset for Arabic Question Answering accepted at SIGIR 2024
☆15Updated 9 months ago
Alternatives and similar repositories for ArabicaQA
Users that are interested in ArabicaQA are comparing it to the libraries listed below
Sorting:
- This is the official repository for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks.☆25Updated 5 months ago
- An effort to benchmark Arabic legal reasoning in foundation models.☆11Updated 8 months ago
- Solving data for LLMs - Create quality synthetic datasets!☆147Updated 3 months ago
- ☆124Updated last year
- Instruction dataset for Arabic with 10,000 instruction and output pairs. CIDAR can be used to fine-tune LLMs to follow instructions.☆39Updated last month
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Updated 10 months ago
- Efficient vector database for hundred millions of embeddings.☆206Updated 11 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆132Updated 4 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆61Updated last year
- Python intefrace for evaluation on chatgpt models☆19Updated last year
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆175Updated 8 months ago
- ☆90Updated 5 months ago
- experiments with inference on llama☆104Updated 11 months ago
- Generalist and Lightweight Model for Text Classification☆128Updated 2 weeks ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year
- Setu is a comprehensive pipeline designed to clean, filter, and deduplicate diverse data sources including Web, PDF, and Speech data. Bui…☆16Updated 11 months ago
- Fine-tune ModernBERT on a large Dataset with Custom Tokenizer Training☆66Updated 3 months ago
- A python package made to generate sequences (greedy and beam-search) from Pytorch (not necessarily HF transformers) models.☆17Updated 3 weeks ago
- Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.☆328Updated 4 months ago
- Aranizer: A Custom Tokenizer based on SentencePiece and BPE tailored for Arabic Language Modeling☆20Updated 9 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆66Updated 6 months ago
- This repository contains the code for dataset curation and finetuning of instruct variant of the Bilingual OpenHathi model. The resultin…☆23Updated last year
- MAFAND-MT☆55Updated 10 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆166Updated 7 months ago
- minimal pytorch implementation of bm25 (with sparse tensors)☆101Updated last year
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆47Updated last week
- Code repository for "Introducing Airavata: Hindi Instruction-tuned LLM"☆59Updated 6 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆256Updated 10 months ago
- ☆70Updated 4 months ago
- Enhancing Translation with RAG-Powered Large Language Models☆81Updated last month