mesolitica / llm-embedding
Finetune Malaysian LLM for Malaysian context embedding task.
☆19Updated 4 months ago
Related projects: ⓘ
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆27Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆42Updated 10 months ago
- Retrieval Augmented Generation Generalized Evaluation Dataset☆51Updated this week
- Code and pre-trained models for "ReasonBert: Pre-trained to Reason with Distant Supervision", EMNLP'2021☆29Updated last year
- Rough codebase for exploring initialization strategies for new word embeddings in pretrained LMs☆14Updated 2 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- ☆15Updated last month
- Code for "Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking" (https://arxiv.org/abs/2…☆12Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆67Updated 2 months ago
- ☆43Updated last month
- [ACL 2023] Few-shot Reranking for Multi-hop QA via Language Model Prompting☆25Updated last year
- ☆13Updated 3 months ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated last year
- ☆17Updated 6 months ago
- Large-scale query-focused multi-document Summarization dataset☆11Updated 2 years ago
- ☆24Updated 3 months ago
- ☆12Updated 7 months ago
- PyTorch implementation for MRL☆17Updated 7 months ago
- Code for Paper "Target-oriented Fine-tuning for Zero-Resource Named Entity Recognition"☆21Updated last year
- Code for Stage-wise Fine-tuning for Graph-to-Text Generation☆26Updated last year
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆12Updated last year
- ☆52Updated 7 months ago
- StAtutory Reasoning Assessment☆11Updated last year
- ☆10Updated 3 months ago
- Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval☆33Updated 3 months ago
- ☆9Updated last month
- QuoteSum is a textual QA dataset containing Semi-Extractive Multi-source Question Answering (SEMQA) examples written by humans, based on …☆12Updated 5 months ago
- ☆23Updated last year
- Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)☆40Updated 3 years ago
- Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…☆29Updated last month