HF中国镜像站

oeg
/

Sent2vec_CelebA_Sp

celebFaces Attributes

Model card Files Files and versions Community

Sent2vec_CelebA_Sp / README.md

eduar03yauri's picture

Update README.md

357d236 about 2 years ago

|

3 kB

	---
	license: apache-2.0
	datasets:
	- oeg/CelebA_Sent2Vect_Sp
	language:
	- es
	tags:
	- CelebA
	- Spanish
	- celebFaces Attributes
	---
	# Sent2vec trained with data from the descriptive text corpus of the CelebA dataset

	## Overview

	- Language: Spanish
	- Data: [CelebA_Sent2vec_Sp](https://huggingface.co/datasets/oeg/CelebA_Sent2Vect_Sp).
	- Architecture: Sent2vec

	## Description

	Sent2vec can be used directly for English texts. However, since this work is used with Spanish text, it has been necessary to train it
	previously using the generated corpus ([in this respository](https://huggingface.co/datasets/oeg/CelebA_Sent2Vect_Sp)) with the following process:
	- Initial preprocessing of the Spanish corpus. For this purpose, a new file has been developed in which each of the entries of the original
	corpus is saved and the other components, such as the names of the image it describes and symbols, are removed.
	A total of 192,209 sentences are available for training.
	- Apply a second pre-processing consisting of removing accents. _stopwords_ and connectors were retained as part of
	- the sentence structure during training.
	- Configure the libraries, e.g., _Sent2vec_ and _FastText_, and the parameters. The parameters have been set empirically,
	being: 4,800 feature vector dimension, 5,000 epochs, 200 threads, 2 n-grams, and 0.05 learning rate.

	## How to use

	## Licensing information
	This model is available under the [Apache License 2.0.](https://www.apache.org/licenses/LICENSE-2.0)

	## Citation information

	Citing: If you used Sent2vec+CelebA model in your work, please cite the [????](???):

	<!--```bib
	@article{inffus_TINTO,
	title = {A novel deep learning approach using blurring image techniques for Bluetooth-based indoor localisation},
	journal = {Information Fusion},
	author = {Reewos Talla-Chumpitaz and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro},
	volume = {91},
	pages = {173-186},
	year = {2023},
	issn = {1566-2535},
	doi = {https://doi.org/10.1016/j.inffus.2022.10.011}
	}
	```-->

	## Autors
	- [Eduardo Yauri Lozano](https://github.com/eduar03yauri)
	- [Manuel Castillo-Cara](https://github.com/manwestc)
	- [Raúl García-Castro](https://github.com/rgcmme)

	[Universidad Nacional de Ingeniería](https://www.uni.edu.pe/), [Ontology Engineering Group](https://oeg.fi.upm.es/), [Universidad Politécnica de Madrid.](https://www.upm.es/internacional)

	## Contributors
	See the full list of contributors [here](https://github.com/eduar03yauri/DCGAN-text2face-forSpanishs).

	<kbd><img src="https://www.uni.edu.pe/images/logos/logo_uni_2016.png" alt="Universidad Politécnica de Madrid" width="100"></kbd>
	<kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-oeg.png" alt="Ontology Engineering Group" width="100"></kbd>
	<kbd><img src="https://raw.githubusercontent.com/oeg-upm/TINTO/main/assets/logo-upm.png" alt="Universidad Politécnica de Madrid" width="100"></kbd>