MINERS ⛏️: The semantic retrieval benchmark for evaluating multilingual language models. (EMNLP 2024 Findings)
☆14Oct 3, 2024Updated last year
Alternatives and similar repositories for miners
Users that are interested in miners are comparing it to the libraries listed below
Sorting:
- ☆10Dec 17, 2020Updated 5 years ago
- ☆11Jun 23, 2022Updated 3 years ago
- PathPiece tokenizer☆13Nov 10, 2024Updated last year
- Enhaced version of Wikiextrator: A wikipedia dumps extractor☆28Sep 17, 2025Updated 5 months ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- [Unofficial] Kakaotrans: Kakao translate API for python☆16Mar 29, 2020Updated 5 years ago
- PyTorch implementation of NAACL 2021 paper "Multi-view Subword Regularization"☆26Jun 2, 2021Updated 4 years ago
- Multilingual Open Text☆25May 8, 2025Updated 9 months ago
- [HCLT 2022] Korean sentence text similarity dataset using naver shopping review☆25Oct 20, 2022Updated 3 years ago
- Meta Representation Transformation for Low-resource Cross-lingual Learning☆41May 5, 2021Updated 4 years ago
- huggingface에 있는 한국어 데이터 세트☆36Oct 10, 2024Updated last year
- 업무자동화를 위한 Python 강의를 듣고 정리한 자료☆13Oct 10, 2017Updated 8 years ago
- ☆10Oct 2, 2024Updated last year
- Dataset Catalogue Homepage for Indonesian Languages☆10Feb 19, 2024Updated 2 years ago
- Linear Attention for Efficient Bidirectional Sequence Modeling☆15May 13, 2025Updated 9 months ago
- A extension of Transformers library to include T5ForSequenceClassification class.☆40Apr 17, 2023Updated 2 years ago
- Long-context pretrained encoder-decoder models☆96Oct 28, 2022Updated 3 years ago
- Exposure-slot: Exposure-centric representations learning with Slot-in-Slot Attention for Region-aware Exposure Correction, Computer Visi…☆21Sep 2, 2025Updated 6 months ago
- pytorch implementation for "Mutual Information Neural Estimation"☆11Dec 13, 2019Updated 6 years ago
- 0-Shot Tokenizer Transplant☆14May 16, 2025Updated 9 months ago
- colorizing images☆10Sep 16, 2022Updated 3 years ago
- ☆13Nov 28, 2025Updated 3 months ago
- ↔️ Utilizing RBERT model structure for KLUE Relation Extraction task☆15Nov 15, 2022Updated 3 years ago
- decontamination☆26Dec 3, 2025Updated 3 months ago
- python project template for personal projects! 🙋♀️☆11Nov 28, 2020Updated 5 years ago
- Digitale Geisteswissenschaften rund um Graphentechnologien☆10Feb 12, 2026Updated 2 weeks ago
- 🎭 Official code and dataset for our CCGPK@COLING 2022 paper - "PersonaChatGen: Generating Personalized Dialogue using GPT-3"☆13Mar 26, 2024Updated last year
- ☆10Sep 13, 2022Updated 3 years ago
- DoWhy 스터디 Gitbook☆10Feb 12, 2023Updated 3 years ago
- This repository provides the source code used to automatically generate the book summarization datasets described in the paper titled "Ec…☆10Apr 14, 2025Updated 10 months ago
- BERT score for text generation☆12Jan 15, 2025Updated last year
- Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.☆10Aug 13, 2023Updated 2 years ago
- Testing DeepSpeed integration in 🤗 Accelerate☆11Jun 28, 2022Updated 3 years ago
- Automatically Update LLM Papers Daily using Github Actions. Ref: https://github.com/Vincentqyw/cv-arxiv-daily☆10Updated this week
- Poetry Corpora Annotated on Aesthetic Emotions☆12Aug 2, 2022Updated 3 years ago
- [COLING 2024] SentiCSE: A Sentiment-aware Contrastive Sentence Embedding Framework with Sentiment-guided Textual Similarity☆13May 8, 2024Updated last year
- ☆10Dec 6, 2019Updated 6 years ago
- KnowMAN: Weakly Supervised Multinomial Adversarial Networks☆12Nov 9, 2021Updated 4 years ago
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models☆11Jan 19, 2024Updated 2 years ago