--- pipeline_tag: text-to-image license: agpl-3.0 --- # Open-LiteVAE [[github]](https://github.com/RGenDiff/open-litevae) This repository contains a LiteVAE model trained with the open-litevae codebase, based on the paper "[LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models](https://openreview.net/forum?id=mTAbl8kUzq)" [2024]. **Note:** This model is intended for demonstration purposes, we do not recommend using it in production. **license:** AGPL-3.0 --- ## Configuration Details | Parameter | Value | |------------------|----| | Downscale Factor | 8x | | Latent Z dim | 12 | | Encoder Size (params) | B (6.2M) | | Decoder Size (params) | M (54M) | | Discriminator | UNetGAN-L | | Training Set | ImageNet-1k | | Training Resolution | 128x128 --> 256x256| | Training Steps | 100k --> 50k | ## Metric Comparison | Model | Z dim | rFID | LPIPS | PSNR | SSIM | |-------|-------|------|-------|------|------| | SD1-VAE | 4 | 0.75 | 0.138 | 25.70 | 0.72 | | SD3-VAE | 16 | 0.22 | 0.069 | 29.59 | 0.86 | | olvf8c12 (this repo) | 12 | 0.24 | 0.084 | 28.74 | 0.84 | ## Usage ```python # install open-litevae https://github.com/RGenDiff/open-litevae # from PIL import Image import torch import torchvision.transforms as transforms from torchvision.utils import save_image from omegaconf import OmegaConf from safetensors.torch import load_file from olvae.utils import instantiate_from_config def load_model_from_config(config_path, ckpt_path, device=torch.device("cuda")): config = OmegaConf.load(config_path) sd = load_file(ckpt_path) model = instantiate_from_config(config.model) model.load_state_dict(sd, strict=False) model = model.to(device).eval() return model # load the model olitevae = load_model_from_config(config_path="configs/olitevaeB_im_f8c12.yaml", ckpt_path="olitevaeB_im_f8c12.safetensors") img_transforms = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), ]) # encode image = img_transforms(Image.open()).to(device) latent = olitevae.encode(image.unsqueeze(0)).sample() print(latent.shape) # decode y = olitevae.decode(latent) save_image(y[0]*0.5 + 0.5, "decoded_image.png") ``` ## Please Cite the Original Paper ``` @inproceedings{ sadat2024litevae, title={Lite{VAE}: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models}, author={Seyedmorteza Sadat and Jakob Buhmann and Derek Bradley and Otmar Hilliges and Romann M. Weber}, booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}, year={2024}, url={https://openreview.net/forum?id=mTAbl8kUzq} } ```