HF中国镜像站

jiangchengchengNLP
/

EmotionCLIP-V2

Model card Files Files and versions Community

jiangchengchengNLP commited on 12 days ago

Commit

c19c291

·

verified ·

1 Parent(s): 210a895

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -33,14 +33,14 @@ The model is trained using the following datasets:
 2. **Open Human Facial Emotion Recognition Dataset**:
    - Contains nearly 10,000 images with emotion labels gathered from wild scenes to enhance the model's capability in facial emotion recognition.
-3.**SFEW**
   -  The Static Facial Expressions in the Wild (SFEW) dataset is a dataset for facial expression recognition. It is created by selecting static frames from the AFEW database by computing keyframes based on facial point clustering.
-4.**Neutral add**
   -  Contains 50K images without obvious emotional fluctuations as a supplementary category.
 ## training method
 Combining the fine-tuning methods of layer_norm tuning, prefix tuning, and prompt tuning, the practical results show that the mixture of the three training methods can be comparable to or even exceed the performance of full fine-tuning in generalized visual emotion recognition by introducing only a small number of parameters. In addition, thanks to the adjustment of layer norm, it converges faster than prefix tuning and prompt tuning, achieving higher performance than EmotionCLIP-V1.

 2. **Open Human Facial Emotion Recognition Dataset**:
    - Contains nearly 10,000 images with emotion labels gathered from wild scenes to enhance the model's capability in facial emotion recognition.
+3.**SFEW**:
   -  The Static Facial Expressions in the Wild (SFEW) dataset is a dataset for facial expression recognition. It is created by selecting static frames from the AFEW database by computing keyframes based on facial point clustering.
+4.**Neutral add**:
   -  Contains 50K images without obvious emotional fluctuations as a supplementary category.
 ## training method
 Combining the fine-tuning methods of layer_norm tuning, prefix tuning, and prompt tuning, the practical results show that the mixture of the three training methods can be comparable to or even exceed the performance of full fine-tuning in generalized visual emotion recognition by introducing only a small number of parameters. In addition, thanks to the adjustment of layer norm, it converges faster than prefix tuning and prompt tuning, achieving higher performance than EmotionCLIP-V1.