facebookresearch / jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
☆2,673Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for jepa
- 4M: Massively Multimodal Masked Modeling☆1,607Updated last month
- 【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection☆3,003Updated last month
- DeepSeek-VL: Towards Real-World Vision-Language Understanding☆2,077Updated 6 months ago
- ☆4,035Updated 5 months ago
- Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.☆2,339Updated 2 months ago
- A native PyTorch Library for large model training☆2,623Updated this week
- The official PyTorch implementation of Google's Gemma models☆5,290Updated 3 months ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,840Updated 3 months ago
- PyTorch native finetuning library☆4,336Updated this week
- Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"☆6,369Updated 5 months ago
- Schedule-Free Optimization in PyTorch☆1,898Updated 2 weeks ago
- Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised…☆2,819Updated 6 months ago
- nanoGPT style version of Llama 3.1☆1,246Updated 3 months ago
- Mixture-of-Experts for Large Vision-Language Models☆1,989Updated 6 months ago
- ☆2,898Updated last month
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,763Updated 3 weeks ago
- VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and…☆1,999Updated 2 weeks ago
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆5,669Updated last month
- Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch☆1,692Updated last week
- Tools for merging pretrained large language models.☆4,816Updated 2 weeks ago
- streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL☆1,390Updated this week
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection☆1,435Updated 3 weeks ago
- Reaching LLaMA2 Performance with 0.1M Dollars☆960Updated 3 months ago
- Training LLMs with QLoRA + FSDP☆1,418Updated last week
- Open weights LLM from Google DeepMind.☆2,477Updated this week
- Next-Token Prediction is All You Need☆1,824Updated 3 weeks ago
- EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything☆2,160Updated 5 months ago
- Official repository of Evolutionary Optimization of Model Merging Recipes☆1,230Updated 7 months ago
- Consistency Distilled Diff VAE☆2,137Updated last year
- Mora: More like Sora for Generalist Video Generation☆1,517Updated last month