πΈ GlotCC Dataset and Pipline -- NeurIPS 2024
β20Apr 6, 2025Updated 11 months ago
Alternatives and similar repositories for GlotCC
Users that are interested in GlotCC are comparing it to the libraries listed below
Sorting:
- πΈ GlotWeb: Web Indexing for Minority Languages (WWW 2026)β17Feb 27, 2026Updated last week
- β10Oct 2, 2024Updated last year
- GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific wayβ18Nov 4, 2025Updated 4 months ago
- Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMsβ13Feb 13, 2024Updated 2 years ago
- Repository for "Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages"β15Oct 4, 2024Updated last year
- Klexikon: A German Dataset for Joint Summarization and Simplificationβ17Oct 5, 2022Updated 3 years ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resourceβ¦β26Feb 16, 2026Updated 2 weeks ago
- β44Feb 11, 2026Updated 3 weeks ago
- β41May 27, 2025Updated 9 months ago
- Efficient encoder-decoder architecture for small language models (β€1B parameters) with cross-architecture knowledge distillation and visiβ¦β33Feb 7, 2025Updated last year
- Code for "BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition"β32Jun 20, 2023Updated 2 years ago
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"β34Jun 13, 2025Updated 8 months ago
- TyDiP Multilingual Politeness dataset and codeβ12Oct 15, 2023Updated 2 years ago
- Finite-state script normalization and processing utilitiesβ46Feb 25, 2026Updated last week
- cbReader - A simple web-based comic book reader (CBZ/CBR)β10May 21, 2018Updated 7 years ago
- Code and data for the WSDM '19 paper "Crosslingual Document Embedding as Reduced-Rank Ridge Regression (Cr5)"β30Aug 17, 2019Updated 6 years ago
- Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"β34Sep 20, 2025Updated 5 months ago
- MiniGPT-Pancreas: Multimodal Large language Model for Pancreas Cancer Classification and Detectionβ11Sep 19, 2025Updated 5 months ago
- Tool for sentiment analysis annotationβ13Mar 26, 2025Updated 11 months ago
- β72Updated this week
- Cannabis strain informationβ10Feb 20, 2016Updated 10 years ago
- β14Dec 5, 2025Updated 3 months ago
- Code for EMNLP 2021 main conference paper "Dynamic Knowledge Distillation for Pre-trained Language Models"β41Aug 9, 2022Updated 3 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP researchβ34Dec 8, 2022Updated 3 years ago
- Linguistic Reconstruction with LingPyβ15Aug 5, 2024Updated last year
- A semi print-in-place hand for human-like manipulation, designed to be built by anyone.β17Jan 5, 2026Updated 2 months ago
- A RAG that can scale π§π»βπ»β11May 28, 2024Updated last year
- Code repository supporting the paper "Auto-Generating Weak Labels for Real & Synthetic Data to Improve Label-Scarce Medical Image Segmentβ¦β11Apr 29, 2024Updated last year
- A hubot script to perform Service Now API record lookups.β10Apr 17, 2023Updated 2 years ago
- A repository for resources relating to NLP in the Balochi languageβ19Jun 3, 2023Updated 2 years ago
- Experimental framework taking inspiration from biological systems, combining compression-based architectures, group theory, and symmetry β¦β14Nov 13, 2025Updated 3 months ago
- β13Oct 16, 2023Updated 2 years ago
- COMET for African languagesβ10Jan 24, 2025Updated last year
- A benchmark dataset designed to support the development and evaluation of large language models (LLMs) for conversational mental health aβ¦β17Feb 24, 2025Updated last year
- gl hfβ14Oct 18, 2021Updated 4 years ago
- β12Feb 27, 2026Updated last week
- β16Jul 20, 2025Updated 7 months ago
- [IROS 2025] EgoLoc: Zero-Shot Temporal Interaction Localization for Egocentric Videosβ33Jan 13, 2026Updated last month