facebookresearch / jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
☆2,614Updated last month
Related projects: ⓘ
- A Native-PyTorch Library for LLM Fine-tuning☆3,942Updated this week
- ☆4,006Updated 3 months ago
- Modeling, training, eval, and inference code for OLMo☆4,399Updated this week
- Video-LLaVA: Learning United Visual Representation by Alignment Before Projection☆2,846Updated last month
- Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"☆6,008Updated 3 months ago
- Mixture-of-Experts for Large Vision-Language Models☆1,911Updated 4 months ago
- VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and…☆1,786Updated last week
- 4M: Massively Multimodal Masked Modeling☆1,543Updated 2 months ago
- Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"☆3,175Updated 4 months ago
- The official PyTorch implementation of Google's Gemma models☆5,239Updated last month
- DeepSeek-VL: Towards Real-World Vision-Language Understanding☆2,007Updated 4 months ago
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆5,506Updated this week
- Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised…☆2,789Updated 4 months ago
- Tools for merging pretrained large language models.☆4,501Updated this week
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.☆1,756Updated last month
- ☆2,395Updated this week
- TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones☆1,235Updated 5 months ago
- Training LLMs with QLoRA + FSDP☆1,382Updated last week
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection☆1,354Updated last week
- A native PyTorch Library for large model training☆1,544Updated this week
- Mora: More like Sora for Generalist Video Generation☆1,474Updated 2 months ago
- ☆7,075Updated last month
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,683Updated last week
- Schedule-Free Optimization in PyTorch☆1,800Updated last month
- Reaching LLaMA2 Performance with 0.1M Dollars☆955Updated last month
- An Open-source Toolkit for LLM Development☆2,684Updated 3 months ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆5,871Updated 3 months ago
- Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model☆3,200Updated 7 months ago
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆9,031Updated 2 months ago
- streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, Phi-3.5 Vision☆1,249Updated this week