File size: 2,995 Bytes
f469ed3 1b9defd f469ed3 1b9defd 2bf6841 1b9defd 2bf6841 1b9defd 6bd3ab7 2bf6841 6bd3ab7 2bf6841 6bd3ab7 2bf6841 357d236 2bf6841 357d236 2bf6841 6bd3ab7 2bf6841 6bd3ab7 2bf6841 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
license: apache-2.0
datasets:
- oeg/CelebA_Sent2Vect_Sp
language:
- es
tags:
- CelebA
- Spanish
- celebFaces Attributes
---
# Sent2vec trained with data from the descriptive text corpus of the CelebA dataset
## Overview
- **Language**: Spanish
- **Data**: [CelebA_Sent2vec_Sp](https://huggingface.co/datasets/oeg/CelebA_Sent2Vect_Sp).
- **Architecture**: Sent2vec
## Description
Sent2vec can be used directly for English texts. However, since this work is used with Spanish text, it has been necessary to train it
previously using the generated corpus ([in this respository](https://huggingface.co/datasets/oeg/CelebA_Sent2Vect_Sp)) with the following process:
- Initial preprocessing of the Spanish corpus. For this purpose, a new file has been developed in which each of the entries of the original
corpus is saved and the other components, such as the names of the image it describes and symbols, are removed.
A total of 192,209 sentences are available for training.
- Apply a second pre-processing consisting of removing accents. _stopwords_ and connectors were retained as part of
- the sentence structure during training.
- Configure the libraries, e.g., _Sent2vec_ and _FastText_, and the parameters. The parameters have been set empirically,
being: 4,800 feature vector dimension, 5,000 epochs, 200 threads, 2 n-grams, and 0.05 learning rate.
## How to use
## Licensing information
This model is available under the [Apache License 2.0.](https://www.apache.org/licenses/LICENSE-2.0)
## Citation information
**Citing**: If you used Sent2vec+CelebA model in your work, please cite the **[????](???)**:
<!--```bib
@article{inffus_TINTO,
title = {A novel deep learning approach using blurring image techniques for Bluetooth-based indoor localisation},
journal = {Information Fusion},
author = {Reewos Talla-Chumpitaz and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro},
volume = {91},
pages = {173-186},
year = {2023},
issn = {1566-2535},
doi = {https://doi.org/10.1016/j.inffus.2022.10.011}
}
```-->
## Autors
- [Eduardo Yauri Lozano](https://github.com/eduar03yauri)
- [Manuel Castillo-Cara](https://github.com/manwestc)
- [Raúl García-Castro](https://github.com/rgcmme)
[*Universidad Nacional de Ingeniería*](https://www.uni.edu.pe/), [*Ontology Engineering Group*](https://oeg.fi.upm.es/), [*Universidad Politécnica de Madrid.*](https://www.upm.es/internacional)
## Contributors
See the full list of contributors [here](https://github.com/eduar03yauri/DCGAN-text2face-forSpanishs).
<kbd><img src="https://www.uni.edu.pe/images/logos/logo_uni_2016.png" alt="Universidad Politécnica de Madrid" width="100"></kbd>
<kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-oeg.png" alt="Ontology Engineering Group" width="100"></kbd>
<kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-upm.png" alt="Universidad Politécnica de Madrid" width="100"></kbd> |