HF中国镜像站

PerceptCLIP
/

PerceptCLIP_Emotions

emotion_prediction

computer_vision

perceptual_tasks

Model card Files Files and versions Community

PerceptCLIP_Emotions / README.md

Amitz244's picture

Update README.md

d94ab16 verified 3 days ago

|

history blame contribute delete

3.12 kB

	---
	language:
	- en
	base_model:
	- openai/clip-vit-large-patch14
	tags:
	- emotion_prediction
	- VEA
	- computer_vision
	- perceptual_tasks
	- CLIP
	- EmoSet
	---

	PerceptCLIP-Emotions is a model designed to predict the emotions that an image evokes in users. This is the official model from the paper:
	📄 ["Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks"](https://arxiv.org/abs/2503.13260).
	We apply LoRA adaptation on the CLIP visual encoder and add an MLP head for emotion classification. Our model achieves state-of-the-art results.

	## Training Details

	- Dataset: [EmoSet](https://vcc.tech/EmoSet)
	- Architecture: CLIP Vision Encoder (ViT-L/14) with LoRA adaptation
	- Loss Function: Cross Entropy Loss
	- Optimizer: AdamW
	- Learning Rate: 0.0001
	- Batch Size: 32

	## Installation & Requirements
	You can set up the environment using environment.yml or manually install dependencies:
	- python=3.9.15
	- cudatoolkit=11.7
	- torchvision=0.14.0
	- transformers=4.45.2
	- peft=0.14.0

	## Usage

	To use the model for inference:

	```python
	from torchvision import transforms
	import torch
	from PIL import Image
	from huggingface_hub import hf_hub_download
	import importlib.util

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	# Load the model class definition dynamically
	class_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_Emotions", filename="modeling.py")
	spec = importlib.util.spec_from_file_location("modeling", class_path)
	modeling = importlib.util.module_from_spec(spec)
	spec.loader.exec_module(modeling)

	# initialize a model
	ModelClass = modeling.clip_lora_model
	model = ModelClass().to(device)

	# Load pretrained model
	model_path = hf_hub_download(repo_id="PerceptCLIP/PerceptCLIP_Emotions", filename="perceptCLIP_Emotions.pth")
	model.load_state_dict(torch.load(model_path, map_location=device))
	model.eval()

	# Emotion label mapping
	idx2label = {
	0: "amusement",
	1: "awe",
	2: "contentment",
	3: "excitement",
	4: "anger",
	5: "disgust",
	6: "fear",
	7: "sadness"
	}

	# Preprocessing function
	def emo_preprocess():
	transform = transforms.Compose([
	transforms.Resize(224),
	transforms.CenterCrop(size=(224, 224)),
	transforms.ToTensor(),
	transforms.Normalize(mean=(0.48145466, 0.4578275, 0.40821073), std=(0.26862954, 0.26130258, 0.27577711)),
	])
	return transform

	# Load an image
	image = Image.open("image_path.jpg").convert("RGB")
	image = emo_preprocess()(image).unsqueeze(0).to(device)

	# Run inference
	with torch.no_grad():
	outputs = model(image)
	_, predicted = outputs.max(1)

	# Get emotion label
	predicted_emotion = idx2label[predicted.item()]
	print(f"Predicted Emotion: {predicted_emotion}")
	```

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@article{zalcher2025don,
	title={Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks},
	author={Zalcher, Amit and Wasserman, Navve and Beliy, Roman and Heinimann, Oliver and Irani, Michal},
	journal={arXiv preprint arXiv:2503.13260},
	year={2025}
	}