Amitz244 commited on
Commit
dbe6a3d
·
verified ·
1 Parent(s): 6f12be0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ base_model:
5
+ - openai/clip-vit-large-patch14
6
+ tags:
7
+ - emotion_prediction
8
+ - VEA
9
+ - computer_vision
10
+ - perceptual_tasks
11
+ - CLIP
12
+ - EmoSet
13
+ ---
14
+ # Don’t Judge Before You CLIP: Visual Emotion Analysis Model
15
+
16
+ This model is part of our paper:
17
+ *"Don’t Judge Before You CLIP: A Unified Approach for Perceptual Tasks"*
18
+ It was trained on the *EmoSet dataset* to predict emotion class.
19
+
20
+ ## Model Overview
21
+
22
+ Visual perceptual tasks, such as visual emotion analysis, aim to estimate how humans perceive and interpret images. Unlike objective tasks (e.g., object recognition), these tasks rely on subjective human judgment, making labeled data scarce.
23
+
24
+ Our approach leverages *CLIP* as a prior for perceptual tasks, inspired by cognitive research showing that CLIP correlates well with human judgment. This suggests that CLIP implicitly captures human biases, emotions, and preferences. We fine-tune CLIP minimally using LoRA and incorporate an MLP head to adapt it to each specific task.
25
+
26
+ ## Training Details
27
+
28
+ - *Dataset*: [EmoSet](https://vcc.tech/EmoSet)
29
+ - *Architecture*: CLIP Vision Encoder (ViT-L/14) with *LoRA adaptation*
30
+ - *Loss Function*: Cross Entropy Loss
31
+ - *Optimizer*: AdamW
32
+ - *Learning Rate*: 0.0001
33
+ - *Batch Size*: 32
34
+
35
+ ## Performance
36
+
37
+ The model was trained on the *EmoSet dataset* using the common train, val, test splits and exhibits *state-of-the-art performance compared to previous methods.
38
+
39
+ ## Usage
40
+
41
+ To use the model for inference:
42
+
43
+ ```python
44
+ from torchvision import transforms
45
+ import torch
46
+ from PIL import Image
47
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
48
+ # Load model
49
+ model = torch.load("EmoSet_clip_Lora_16.0R_8.0alphaLora_32_batch_0.0001_headmlp.pth").to(device).eval()
50
+ # Load an image
51
+ image = Image.open("image_path.jpg").convert("RGB")
52
+ # Preprocess and predict
53
+ def Emo_preprocess():
54
+ transform = transforms.Compose([
55
+ transforms.Resize(224),
56
+ transforms.CenterCrop(size=(224,224)),
57
+ transforms.ToTensor(),
58
+ # Note: The model normalizes the image inside the forward pass
59
+ # using mean = (0.48145466, 0.4578275, 0.40821073) and
60
+ # std = (0.26862954, 0.26130258, 0.27577711)
61
+ ])
62
+ return transform
63
+ image = Emo_preprocess()(image).unsqueeze(0).to(device)
64
+ with torch.no_grad():
65
+ emo_label = model(image).item()
66
+ print(f"Predicted Emotion: {emo_label}")