malaysia-ai / pretrain-text-dataset

Prepare pretrain dataset for Malaysian context.

☆12

Alternatives and similar repositories for pretrain-text-dataset:

Users that are interested in pretrain-text-dataset are comparing it to the libraries listed below

BlueCrescent / DocLLM
Implementation of the DocLLM paper for Llama models.
☆12Updated last month
philschmid / optimum-static-quantization
☆28Updated last year
SALT-NLP / Bound-Cap-LLM
Source codes for the paper "Bounding the Capabilities of Large Language Models in Open Text Generation with Prompt Constraints"
☆27Updated last year
philschmid / deep-learning-habana-huggingface
☆30Updated 2 years ago
comet-ml / blog-serving-hugging-face-models
☆20Updated 3 years ago
nbroad1881 / strideformer
Using short models to classify long texts
☆21Updated last year
PrithivirajDamodaran / Alt-ZSC
Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…
☆37Updated 2 years ago
neuralwork / instruct-finetune-mistral
Fine-tune Mistral 7B to generate fashion style suggestions
☆33Updated last year
evidentlyai / community-examples
Examples of using Evidently to evaluate, test and monitor ML models.
☆18Updated last month
cceyda / lit-NER
TorchServe+Streamlit for easily serving your HuggingFace NER models
☆31Updated 2 years ago
GeorgeLuImmortal / DocLLM_reimplementation
☆21Updated 10 months ago
mtszkw / fast-torch
Comparing PyTorch, JIT and ONNX for inference with Transformers
☆17Updated 3 years ago
rajshah4 / huggingface-demos
☆17Updated 2 years ago
stxnext / visual-similarity-search
Visual similarity search engine demo with use of PyTorch Metric Learning and Qdrant
☆12Updated 2 years ago
joseprsm / rexify
🦖 Streamlined Recommender Systems with TensorFlow and KubeFlow
☆18Updated last year
explodinggradients / Funtuner
Supervised instruction finetuning for LLM with HF trainer and Deepspeed
☆34Updated last year
geronimi73 / 3090_shorts
minimal LLM scripts for 24GB VRAM GPUs. training, inference, whatever
☆35Updated this week
mixedbread-ai / binary-embeddings
Showcase how mxbai-embed-large-v1 can be used to produce binary embedding. Binary embeddings enabled 32x storage savings and 40x faster r…
☆15Updated 9 months ago
qdrant / quaterion-models
The collection of bulding blocks building fine-tunable metric learning models
☆32Updated last week
peggy1502 / Data-Science-Articles
A collection of my data science articles published in Towards Data Science and Towards AI.
☆16Updated last year
mesolitica / llm-embedding
Finetune Malaysian LLM for Malaysian context embedding task.
☆20Updated 8 months ago
cromatikap / btw
NLP command-line assistant powered by OpenAI
☆21Updated 11 months ago
alinourian / Fine-tuning-Mistral-7b-QA
Fine tuning Mistral-7b with PEFT(Parameter Efficient Fine-Tuning) and LoRA(Low-Rank Adaptation) on Puffin Dataset(multi-turn conversation…
☆12Updated last year
tushar117 / XAlign
Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages
☆9Updated 2 years ago
wandb / layoutlm_sroie_demo
Finetune LayoutLM on SROIE dataset using W&B tools
☆18Updated 3 years ago
lordtt13 / transformers-experiments
All my experiments with the various transformers and various transformer frameworks available
☆14Updated 3 years ago
MilaNLProc / language-invariant-properties
☆22Updated 2 years ago
bhavsarpratik / semantic-search
[WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…
☆15Updated last year
asahi417 / lm-vocab-trimmer
Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…
☆33Updated 2 months ago
rycolab / probing-via-prompting
☆11Updated 2 years ago