oeg
/

Spanish
CelebA
Spanish
celebFaces Attributes
manwestc commited on
Commit
2bf6841
·
1 Parent(s): 4b98056

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -3
README.md CHANGED
@@ -10,26 +10,57 @@ tags:
10
  - celebFaces Attributes
11
  ---
12
  # Sent2vec trained with data from the descriptive text corpus of the CelebA dataset
 
13
  ## Overview
 
14
  - **Language**: Spanish
15
  - **Data**: [CelebA_Sent2vec_Sp](https://huggingface.co/datasets/oeg/CelebA_Sent2Vect_Sp).
16
  - **Architecture**: Sent2vec
17
 
18
  ## Description
19
 
 
 
 
 
 
 
 
 
 
 
20
  ## How to use
21
 
22
  ## Licensing information
23
- This dataset is available under the [Apache License 2.0.](https://www.apache.org/licenses/LICENSE-2.0)
 
24
  ## Citation information
25
- If you used the model Roberta_CelebA_Sp in your work, please cite [this respository](https://huggingface.co/oeg/Sent2vec_CelebA_Sp/):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ## Autors
27
  - [Eduardo Yauri Lozano](https://github.com/eduar03yauri)
28
  - [Manuel Castillo-Cara](https://github.com/manwestc)
 
29
 
30
  [*Universidad Nacional de Ingeniería*](https://www.uni.edu.pe/), [*Ontology Engineering Group*](https://oeg.fi.upm.es/), [*Universidad Politécnica de Madrid.*](https://www.upm.es/internacional)
31
 
32
  ## Contributors
33
  See the full list of contributors [here](https://github.com/eduar03yauri/DCGAN-text2face-forSpanishs).
34
 
35
- ![logo uni](https://www.uni.edu.pe/images/logos/logo_uni_2016.png)
 
 
 
10
  - celebFaces Attributes
11
  ---
12
  # Sent2vec trained with data from the descriptive text corpus of the CelebA dataset
13
+
14
  ## Overview
15
+
16
  - **Language**: Spanish
17
  - **Data**: [CelebA_Sent2vec_Sp](https://huggingface.co/datasets/oeg/CelebA_Sent2Vect_Sp).
18
  - **Architecture**: Sent2vec
19
 
20
  ## Description
21
 
22
+ Sent2vec can be used directly for English texts. However, since this work is used with Spanish text, it has been necessary to train it
23
+ previously using the generated corpus ([in this respository](https://huggingface.co/datasets/oeg/CelebA_Sent2Vect_Sp)) with the following process:
24
+ - Initial preprocessing of the Spanish corpus. For this purpose, a new file has been developed in which each of the entries of the original
25
+ corpus is saved and the other components, such as the names of the image it describes and symbols, are removed.
26
+ A total of 192,209 sentences are available for training.
27
+ - Apply a second pre-processing consisting of removing accents. _stopwords_ and connectors were retained as part of
28
+ - the sentence structure during training.
29
+ - Configure the libraries, e.g., _Sent2vec_ and _FastText_, and the parameters. The parameters have been set empirically,
30
+ being: 4,800 feature vector dimension, 5,000 epochs, 200 threads, 2 n-grams, and 0.05 learning rate.
31
+
32
  ## How to use
33
 
34
  ## Licensing information
35
+ This model is available under the [Apache License 2.0.](https://www.apache.org/licenses/LICENSE-2.0)
36
+
37
  ## Citation information
38
+
39
+ **Citing**: If you used Sent2vec+CelebA model in your work, please cite the **[????](???)**:
40
+
41
+ ```bib
42
+ @article{inffus_TINTO,
43
+ title = {A novel deep learning approach using blurring image techniques for Bluetooth-based indoor localisation},
44
+ journal = {Information Fusion},
45
+ author = {Reewos Talla-Chumpitaz and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro},
46
+ volume = {91},
47
+ pages = {173-186},
48
+ year = {2023},
49
+ issn = {1566-2535},
50
+ doi = {https://doi.org/10.1016/j.inffus.2022.10.011}
51
+ }
52
+ ```
53
+
54
  ## Autors
55
  - [Eduardo Yauri Lozano](https://github.com/eduar03yauri)
56
  - [Manuel Castillo-Cara](https://github.com/manwestc)
57
+ - [Raúl García-Castro](https://github.com/rgcmme)
58
 
59
  [*Universidad Nacional de Ingeniería*](https://www.uni.edu.pe/), [*Ontology Engineering Group*](https://oeg.fi.upm.es/), [*Universidad Politécnica de Madrid.*](https://www.upm.es/internacional)
60
 
61
  ## Contributors
62
  See the full list of contributors [here](https://github.com/eduar03yauri/DCGAN-text2face-forSpanishs).
63
 
64
+ <kbd><img src="https://www.uni.edu.pe/images/logos/logo_uni_2016.png" alt="Universidad Politécnica de Madrid" width="100"></kbd>
65
+ <kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-oeg.png" alt="Ontology Engineering Group" width="100"></kbd>
66
+ <kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-upm.png" alt="Universidad Politécnica de Madrid" width="100"></kbd>