PerceptCLIP-IQA is a model designed to predict image quality assessment (IQA) score. This is the official model from the paper:
📄 "Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks". We apply LoRA adaptation on the CLIP visual encoder and add an MLP head for IQA score prediction. Our model achieves state-of-the-art results as described in our paper.

Training Details

  • Dataset: KonIQ-10k
  • Architecture: CLIP Vision Encoder (ViT-L/14) with LoRA adaptation
  • Loss Function: Pearson correlation induced loss
  • Optimizer: AdamW
  • Learning Rate: 5e-05
  • Batch Size: 32

Installation & Requirements

You can set up the environment using environment.yml or manually install dependencies:

  • python=3.9.15
  • cudatoolkit=11.7
  • torchvision=0.14.0
  • transformers=4.45.2
  • peft=0.14.0
  • numpy=1.26.4

Usage

To use the model for inference:

from torchvision import transforms
import torch
from PIL import Image
from huggingface_hub import hf_hub_download
import importlib.util
import numpy as np

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the model class definition dynamically
class_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_IQA", filename="modeling.py")
spec = importlib.util.spec_from_file_location("modeling", class_path)
modeling = importlib.util.module_from_spec(spec)
spec.loader.exec_module(modeling)

# initialize a model
ModelClass = modeling.clip_lora_model 
model = ModelClass().to(device)

# Load pretrained model
model_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_IQA", filename="perceptCLIP_IQA.pth")
model.load_state_dict(torch.load(model_path, map_location=device))
model.eval()
# Load an image
image = Image.open("image_path.jpg").convert("RGB")

# Preprocess and predict
def IQA_preprocess():
    random.seed(3407)
    transform = transforms.Compose([
      transforms.Resize((512,384)),
      transforms.RandomCrop(size=(224,224)),  
      transforms.ToTensor(),
      transforms.Normalize(mean=(0.48145466, 0.4578275, 0.40821073), 
                             std=(0.26862954, 0.26130258, 0.27577711))
    ])
    return transform

batch = torch.stack([IQA_preprocess()(image) for _ in range(15)]).to(device)  # Shape: (15, 3, 224, 224)

with torch.no_grad():
    scores = model(batch).cpu().numpy()
 
iqa_score = np.mean(scores)

# maps the predicted score to [0,1] range
min_pred = -6.52
max_pred = 3.11

normalized_score = ((iqa_score - min_pred) / (max_pred - min_pred))
print(f"Predicted quality Score: {normalized_score:.4f}")

Citation

If you use this model in your research, please cite:

@article{zalcher2025don,
  title={Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks},
  author={Zalcher, Amit and Wasserman, Navve and Beliy, Roman and Heinimann, Oliver and Irani, Michal},
  journal={arXiv preprint arXiv:2503.13260},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for PerceptCLIP/PerceptCLIP_IQA

Finetuned
(64)
this model

Space using PerceptCLIP/PerceptCLIP_IQA 1