File size: 2,995 Bytes

f469ed3
 
1b9defd
 
 
 
 
 
 
 
f469ed3
1b9defd
2bf6841
1b9defd
2bf6841
1b9defd
 
 
 
6bd3ab7
 
2bf6841
 
 
 
 
 
 
 
 
 
6bd3ab7
 
 
2bf6841
 
6bd3ab7
2bf6841
 
 
357d236
2bf6841
 
 
 
 
 
 
 
 
 
357d236
2bf6841
6bd3ab7
 
 
2bf6841
6bd3ab7
 
 
 
 
 
2bf6841

---
license: apache-2.0
datasets:
- oeg/CelebA_Sent2Vect_Sp
language:
- es
tags:
- CelebA
- Spanish
- celebFaces Attributes
---
# Sent2vec trained with data from the descriptive text corpus of the CelebA dataset

## Overview

- **Language**: Spanish
- **Data**: [CelebA_Sent2vec_Sp](https://huggingface.co/datasets/oeg/CelebA_Sent2Vect_Sp).
- **Architecture**: Sent2vec
  
## Description

Sent2vec can be used directly for English texts. However, since this work is used with Spanish text, it has been necessary to train it 
previously using the generated corpus ([in this respository](https://huggingface.co/datasets/oeg/CelebA_Sent2Vect_Sp)) with the following process:
- Initial preprocessing of the Spanish corpus. For this purpose, a new file has been developed in which each of the entries of the original
  corpus is saved and the other components, such as the names of the image it describes and symbols, are removed.
  A total of 192,209 sentences are available for training.
- Apply a second pre-processing consisting of removing accents. _stopwords_ and connectors were retained as part of
- the sentence structure during training.
- Configure the libraries, e.g., _Sent2vec_ and _FastText_, and the parameters. The parameters have been set empirically,
  being: 4,800 feature vector dimension, 5,000 epochs, 200 threads, 2 n-grams, and 0.05 learning rate.

## How to use

## Licensing information
This model is available under the [Apache License 2.0.](https://www.apache.org/licenses/LICENSE-2.0)

## Citation information

**Citing**: If you used Sent2vec+CelebA model in your work, please cite the **[????](???)**:

<!--```bib
@article{inffus_TINTO,
    title = {A novel deep learning approach using blurring image techniques for Bluetooth-based indoor localisation},
    journal = {Information Fusion},
    author = {Reewos Talla-Chumpitaz and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro},
    volume = {91},
    pages = {173-186},
    year = {2023},
    issn = {1566-2535},
    doi = {https://doi.org/10.1016/j.inffus.2022.10.011}
}
```-->

## Autors
- [Eduardo Yauri Lozano](https://github.com/eduar03yauri)
- [Manuel Castillo-Cara](https://github.com/manwestc)
- [Raúl García-Castro](https://github.com/rgcmme)

[*Universidad Nacional de Ingeniería*](https://www.uni.edu.pe/), [*Ontology Engineering Group*](https://oeg.fi.upm.es/), [*Universidad Politécnica de Madrid.*](https://www.upm.es/internacional)

## Contributors
See the full list of contributors [here](https://github.com/eduar03yauri/DCGAN-text2face-forSpanishs).

<kbd><img src="https://www.uni.edu.pe/images/logos/logo_uni_2016.png" alt="Universidad Politécnica de Madrid" width="100"></kbd>
<kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-oeg.png" alt="Ontology Engineering Group" width="100"></kbd> 
<kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-upm.png" alt="Universidad Politécnica de Madrid" width="100"></kbd>