mesolitica / llm-embedding
Finetune Malaysian LLM for Malaysian context embedding task.
☆20Updated 8 months ago
Alternatives and similar repositories for llm-embedding:
Users that are interested in llm-embedding are comparing it to the libraries listed below
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆28Updated this week
- ☆14Updated 3 months ago
- ☆19Updated 2 months ago
- Code and pre-trained models for "ReasonBert: Pre-trained to Reason with Distant Supervision", EMNLP'2021☆29Updated last year
- Code for "Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking" (https://arxiv.org/abs/2…☆12Updated last year
- Retrieval Augmented Generation Generalized Evaluation Dataset☆52Updated last month
- ☆23Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆45Updated last year
- Code for EMNLP 2023 paper: DALE: Generative Data Augmentation for Low-Resource Legal NLP☆10Updated last year
- ☆16Updated 5 months ago
- ☆15Updated last year
- ☆34Updated 5 months ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆28Updated 2 years ago
- ☆12Updated 5 months ago
- ☆34Updated last year
- Code and dataset for the emnlp paper titled Instruct and Extract: Instruction Tuning for On-Demand Information Extraction☆50Updated last year
- Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…☆33Updated 2 months ago
- [ACL 2023] Few-shot Reranking for Multi-hop QA via Language Model Prompting☆27Updated last year
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆69Updated last month
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Updated last year
- ☆38Updated 7 months ago
- Perturbation CheckLists for Evaluating NLG Evaluation Metrics, EMNLP 2021☆9Updated 2 years ago
- QuoteSum is a textual QA dataset containing Semi-Extractive Multi-source Question Answering (SEMQA) examples written by humans, based on …☆12Updated 9 months ago
- Resources for "Conversational Entity Linking: Problem Definition and Datasets"☆19Updated last year
- Official codebase for NeurIPS 2022 paper End-to-end Learning to Index and Search in Large Output Spaces☆12Updated last year
- ☆16Updated last year
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆14Updated last year
- Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)☆22Updated last year
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated last year