|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- oeg/CelebA_Sent2Vect_Sp |
|
language: |
|
- es |
|
tags: |
|
- CelebA |
|
- Spanish |
|
- celebFaces Attributes |
|
--- |
|
# Sent2vec trained with data from the descriptive text corpus of the CelebA dataset |
|
|
|
## Overview |
|
|
|
- **Language**: Spanish |
|
- **Data**: [CelebA_Sent2vec_Sp](https://huggingface.co/datasets/oeg/CelebA_Sent2Vect_Sp). |
|
- **Architecture**: Sent2vec |
|
|
|
## Description |
|
|
|
Sent2vec can be used directly for English texts. However, since this work is used with Spanish text, it has been necessary to train it |
|
previously using the generated corpus ([in this respository](https://huggingface.co/datasets/oeg/CelebA_Sent2Vect_Sp)) with the following process: |
|
- Initial preprocessing of the Spanish corpus. For this purpose, a new file has been developed in which each of the entries of the original |
|
corpus is saved and the other components, such as the names of the image it describes and symbols, are removed. |
|
A total of 192,209 sentences are available for training. |
|
- Apply a second pre-processing consisting of removing accents. _stopwords_ and connectors were retained as part of |
|
- the sentence structure during training. |
|
- Configure the libraries, e.g., _Sent2vec_ and _FastText_, and the parameters. The parameters have been set empirically, |
|
being: 4,800 feature vector dimension, 5,000 epochs, 200 threads, 2 n-grams, and 0.05 learning rate. |
|
|
|
## How to use |
|
|
|
## Licensing information |
|
This model is available under the [Apache License 2.0.](https://www.apache.org/licenses/LICENSE-2.0) |
|
|
|
## Citation information |
|
|
|
**Citing**: If you used Sent2vec+CelebA model in your work, please cite the **[????](???)**: |
|
|
|
<!--```bib |
|
@article{inffus_TINTO, |
|
title = {A novel deep learning approach using blurring image techniques for Bluetooth-based indoor localisation}, |
|
journal = {Information Fusion}, |
|
author = {Reewos Talla-Chumpitaz and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro}, |
|
volume = {91}, |
|
pages = {173-186}, |
|
year = {2023}, |
|
issn = {1566-2535}, |
|
doi = {https://doi.org/10.1016/j.inffus.2022.10.011} |
|
} |
|
```--> |
|
|
|
## Autors |
|
- [Eduardo Yauri Lozano](https://github.com/eduar03yauri) |
|
- [Manuel Castillo-Cara](https://github.com/manwestc) |
|
- [Raúl García-Castro](https://github.com/rgcmme) |
|
|
|
[*Universidad Nacional de Ingeniería*](https://www.uni.edu.pe/), [*Ontology Engineering Group*](https://oeg.fi.upm.es/), [*Universidad Politécnica de Madrid.*](https://www.upm.es/internacional) |
|
|
|
## Contributors |
|
See the full list of contributors [here](https://github.com/eduar03yauri/DCGAN-text2face-forSpanishs). |
|
|
|
<kbd><img src="https://www.uni.edu.pe/images/logos/logo_uni_2016.png" alt="Universidad Politécnica de Madrid" width="100"></kbd> |
|
<kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-oeg.png" alt="Ontology Engineering Group" width="100"></kbd> |
|
<kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-upm.png" alt="Universidad Politécnica de Madrid" width="100"></kbd> |