mesolitica / llm-embedding
Finetune Malaysian LLM for Malaysian context embedding task.
☆20Updated 11 months ago
Alternatives and similar repositories for llm-embedding:
Users that are interested in llm-embedding are comparing it to the libraries listed below
- ☆19Updated 4 months ago
- Rough codebase for exploring initialization strategies for new word embeddings in pretrained LMs☆16Updated 3 years ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆30Updated 2 weeks ago
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆14Updated last year
- Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs☆34Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆47Updated last year
- Efficient Memory-Augmented Transformers☆34Updated 2 years ago
- ☆13Updated 2 years ago
- Code and pre-trained models for "ReasonBert: Pre-trained to Reason with Distant Supervision", EMNLP'2021☆29Updated 2 years ago
- ☆16Updated last year
- The codebase for our ACL2023 paper: Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learni…☆29Updated last year
- EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections☆50Updated 3 years ago
- FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction☆24Updated 2 years ago
- ☆15Updated last year
- A Structured Span Selector (NAACL 2022). A structured span selector with a WCFG for span selection tasks (coreference resolution, semanti…☆21Updated 2 years ago
- Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 q…☆87Updated last year
- ☆12Updated last year
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆63Updated last year
- ☆25Updated 2 years ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 2 years ago
- The official implementation of "Distilling Relation Embeddings from Pre-trained Language Models, EMNLP 2021 main conference", a high-qual…☆46Updated 4 months ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆58Updated 2 months ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- ☆97Updated 2 years ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆19Updated last month
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Transformers at any scale☆41Updated last year
- Generate BERT vocabularies and pretraining examples from Wikipedias☆18Updated 4 years ago
- ☆14Updated 5 months ago
- Task Compass: Scaling Multi-task Pre-training with Task Prefix (EMNLP 2022: Findings) (stay tuned & more will be updated)☆22Updated 2 years ago