Aranizer: A Custom Tokenizer based on SentencePiece and BPE tailored for Arabic Language Modeling
☆22Aug 4, 2024Updated last year
Alternatives and similar repositories for aranizer
Users that are interested in aranizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Arabic News Stance Corpus☆11Feb 5, 2021Updated 5 years ago
- Local File Inclusion (LFI) in FHEM 6.0 allows an attacker to include a file, it can lead to sensitive information disclosure.☆12Jan 20, 2021Updated 5 years ago
- ☆55Jul 21, 2024Updated last year
- Intuitive graphical representation of source code☆14Mar 15, 2023Updated 3 years ago
- Scripts to finetune the official implementation of OpenAI's Whisper model☆25Jul 6, 2025Updated 9 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Jun 23, 2024Updated last year
- End-to-End Arabic ASR using DeepSpeech engine☆14Nov 2, 2021Updated 4 years ago
- UBC ARBERT and MARBERT Deep Bidirectional Transformers for Arabic☆116Sep 2, 2021Updated 4 years ago
- Instruction dataset for Arabic with 10,000 instruction and output pairs. CIDAR can be used to fine-tune LLMs to follow instructions.☆46Apr 3, 2025Updated last year
- ☆39Feb 1, 2025Updated last year
- Comprehensive list of resources for automated processing of Tunisian dialect text.☆19Mar 15, 2024Updated 2 years ago
- A zero-config OpenAI client with support for 20+ providers, API key rotation, rate limits, optional LangChain integration and more.☆19Dec 11, 2025Updated 4 months ago
- LOW-RESOURCE NEURAL MACHINE TRANSLATION: A BENCHMARK FOR FIVE AFRICAN LANGUAGES☆16Jul 27, 2020Updated 5 years ago
- Seamlessly integrate IoT data with AI agents, enabling the effortless parsing, processing, and utilization of IoT data streams.☆11Jan 27, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆16Jun 28, 2025Updated 9 months ago
- ArSarcasm-v2 is an extension to the original ArSarcasm dataset. It was used for the shared task on sarcasm detection and sentiment analys…☆12Jan 26, 2022Updated 4 years ago
- Files needed to build Linux images for the Fydetab Duo☆16Feb 25, 2026Updated last month
- Python intefrace for evaluation on chatgpt models☆19Feb 13, 2024Updated 2 years ago
- There are many studies done to detect anomalies based on logs. Current approaches are mainly divided into three categories: supervised le…☆11Jan 10, 2022Updated 4 years ago
- ☆12Jun 6, 2020Updated 5 years ago
- This code belongs to ACL conference paper entitled as "An Online Semantic-enhanced Dirichlet Model for Short Text Stream Clustering"☆17Apr 22, 2021Updated 4 years ago
- One of the problems faced concerning Arabic fake news detection is the scarcity of Arabic datasets. We believe it is important to availab…☆10Jun 13, 2022Updated 3 years ago
- Arabic edition of ALBERT pretrained language models☆16Apr 25, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The official submission from Speech Squad team for the MTC-AIC 2 competition of 2024 where an ASR model is developed tailored for the Egy…☆18Mar 9, 2026Updated last month
- The system enables sophisticated coordination of multiple drones through natural language commands, visual inputs, and real-time environm…☆16Dec 15, 2025Updated 4 months ago
- The dataset for the paper "Machamp: A Generalized Entity Matching Benchmark" published in CIKM 2021☆21Oct 18, 2021Updated 4 years ago
- Multi-threading, Concurrency, Asynchrony, and various Execution Methods implemented in a Rust backend for bleeding edge performance.☆20Nov 11, 2024Updated last year
- ☆128Mar 3, 2024Updated 2 years ago
- أسئلة باللغة العربية تركز على الثقافة السعودية تم اختبارها على عدد من النماذج اللغوية الضخمة LLMs☆18Jan 22, 2025Updated last year
- Using reinforcement learning to minimize fuel consuption when landing a rover on Mars☆12Mar 21, 2022Updated 4 years ago
- Implemention of DVH prediction from the (contoured) anatomical scans ...☆11Jun 20, 2016Updated 9 years ago
- Personal coach to help you obtain desired AI decisions!☆20Oct 3, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Arabic speech recognition, classification and text-to-speech.☆427Sep 30, 2023Updated 2 years ago
- Enable RNNLM lattice rescoring with Pytorch [kaldi]☆12Jun 5, 2020Updated 5 years ago
- The official implementation of CATT Arabic diacritization models.☆69Jul 18, 2025Updated 8 months ago
- CAMeL Dataset☆15Apr 15, 2025Updated last year
- Official code for PLoP☆18Mar 6, 2026Updated last month
- ☆40Dec 25, 2022Updated 3 years ago
- Simulating a 2D Hovering SpaceX Grasshopper with a Thrust Vector Control) engine.☆12Dec 28, 2015Updated 10 years ago