HF中国镜像站

RGenDiff
/

olitevaeB_im_f8c12

Model card Files Files and versions Community

hjups22 commited on Feb 9

Commit

82bb0e7

·

verified ·

1 Parent(s): c8bbf45

Updated README

Files changed (1) hide show

README.md +91 -3

README.md CHANGED Viewed

@@ -1,3 +1,91 @@
----
-license: agpl-3.0
----

+# Open-LiteVAE
+[[github]](https://github.com/RGenDiff/open-litevae)
+This repository contains a LiteVAE model trained with the open-litevae codebase, based on the paper "[LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models](https://openreview.net/forum?id=mTAbl8kUzq)" [2024].
+**Note:** This model is intended for demonstration purposes, we do not recommend using it in production.
+**license:** AGPL-3.0
+---
+## Configuration Details
+| Parameter | Value |
+|------------------|----|
+| Downscale Factor | 8x |
+| Latent Z dim    | 12 |
+| Encoder Size (params) | B (6.2M) |
+| Decoder Size (params) | M (54M) |
+| Discriminator   | UNetGAN-L |
+| Training Set | ImageNet-1k |
+| Training Resolution | 128x128 --> 256x256|
+| Training Steps | 100k --> 50k |
+## Metric Comparison
+| Model | Z dim | rFID | LPIPS | PSNR | SSIM |
+|-------|-------|------|-------|------|------|
+|  SD1-VAE     |    4   |   0.75   | 0.138      |  25.70    |   0.72   |
+|  SD3-VAE     |    16   |   0.22   | 0.069      |  29.59    |   0.86   |
+|  olvf8c12 (this repo)     |    12   |  0.24   | 0.084      |  28.74    |   0.84   |
+## Usage
+```python
+# install open-litevae https://github.com/RGenDiff/open-litevae
+#
+from PIL import Image
+import torch
+import torchvision.transforms as transforms
+from torchvision.utils import save_image
+from omegaconf import OmegaConf
+from safetensors.torch import load_model
+from olvae.utils import instantiate_from_config
+def load_model_from_config(config_path, ckpt_path, device=torch.device("cuda")):
+	config = OmegaConf.load(config_path)
+	sd = load_model(ckpt_path)
+	model = instantiate_from_config(config.model)
+	model.load_state_dict(sd, strict=False)
+	model = model.to(device).eval()
+	return model
+# load the model
+olitevae = load_model_from_config(config_path="configs/olitevaeB_im_f8c12.yaml",
+									ckpt_path="olitevaeB_im_f8c12.safetensors")
+img_transforms = transforms.Compose([
+    transforms.ToTensor(),
+    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
+])
+# encode
+image = img_transforms(Image.open(<your image>)).to(device)
+latent = olitevae.encode(image.unsqueeze(0)).sample()
+print(latent.shape)
+# decode
+y = olitevae.decode(latent)
+save_image(y[0]*0.5 + 0.5, "decoded_image.png")
+```
+## Please Cite the Original Paper
+```
+@inproceedings{
+sadat2024litevae,
+title={Lite{VAE}: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models},
+author={Seyedmorteza Sadat and Jakob Buhmann and Derek Bradley and Otmar Hilliges and Romann M. Weber},
+booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
+year={2024},
+url={https://openreview.net/forum?id=mTAbl8kUzq}
+}
+```