|
--- |
|
pipeline_tag: text-to-image |
|
license: agpl-3.0 |
|
--- |
|
# Open-LiteVAE |
|
[[github]](https://github.com/RGenDiff/open-litevae) |
|
|
|
This repository contains a LiteVAE model trained with the open-litevae codebase, based on the paper "[LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models](https://openreview.net/forum?id=mTAbl8kUzq)" [2024]. |
|
|
|
**Note:** This model is intended for demonstration purposes, we do not recommend using it in production. |
|
|
|
**license:** AGPL-3.0 |
|
|
|
--- |
|
|
|
## Configuration Details |
|
|
|
|
|
| Parameter | Value | |
|
|------------------|----| |
|
| Downscale Factor | 8x | |
|
| Latent Z dim | 12 | |
|
| Encoder Size (params) | B (6.2M) | |
|
| Decoder Size (params) | M (54M) | |
|
| Discriminator | UNetGAN-L | |
|
| Training Set | ImageNet-1k | |
|
| Training Resolution | 128x128 --> 256x256| |
|
| Training Steps | 100k --> 50k | |
|
|
|
|
|
## Metric Comparison |
|
|
|
|
|
| Model | Z dim | rFID | LPIPS | PSNR | SSIM | |
|
|-------|-------|------|-------|------|------| |
|
| SD1-VAE | 4 | 0.75 | 0.138 | 25.70 | 0.72 | |
|
| SD3-VAE | 16 | 0.22 | 0.069 | 29.59 | 0.86 | |
|
| olvf8c12 (this repo) | 12 | 0.24 | 0.084 | 28.74 | 0.84 | |
|
|
|
## Usage |
|
|
|
|
|
```python |
|
# install open-litevae https://github.com/RGenDiff/open-litevae |
|
# |
|
|
|
from PIL import Image |
|
import torch |
|
import torchvision.transforms as transforms |
|
from torchvision.utils import save_image |
|
from omegaconf import OmegaConf |
|
from safetensors.torch import load_file |
|
from olvae.utils import instantiate_from_config |
|
|
|
def load_model_from_config(config_path, ckpt_path, device=torch.device("cuda")): |
|
config = OmegaConf.load(config_path) |
|
sd = load_file(ckpt_path) |
|
model = instantiate_from_config(config.model) |
|
model.load_state_dict(sd, strict=False) |
|
model = model.to(device).eval() |
|
return model |
|
|
|
# load the model |
|
olitevae = load_model_from_config(config_path="configs/olitevaeB_im_f8c12.yaml", |
|
ckpt_path="olitevaeB_im_f8c12.safetensors") |
|
|
|
img_transforms = transforms.Compose([ |
|
transforms.ToTensor(), |
|
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), |
|
]) |
|
|
|
# encode |
|
image = img_transforms(Image.open(<your image>)).to(device) |
|
latent = olitevae.encode(image.unsqueeze(0)).sample() |
|
print(latent.shape) |
|
|
|
# decode |
|
y = olitevae.decode(latent) |
|
save_image(y[0]*0.5 + 0.5, "decoded_image.png") |
|
|
|
``` |
|
|
|
|
|
## Please Cite the Original Paper |
|
|
|
``` |
|
@inproceedings{ |
|
sadat2024litevae, |
|
title={Lite{VAE}: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models}, |
|
author={Seyedmorteza Sadat and Jakob Buhmann and Derek Bradley and Otmar Hilliges and Romann M. Weber}, |
|
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}, |
|
year={2024}, |
|
url={https://openreview.net/forum?id=mTAbl8kUzq} |
|
} |
|
``` |