Vista Equipo: Propuesta de algoritmo que combina el agrupamiento en subespacios basado en densidad y el agrupamiento basado en restricciones para la detección de grupos que incluyan atributos de interés en conjuntos de datos de alta dimensionalidad

Propuesta de algoritmo que combina el agrupamiento en subespacios basado en densidad y el agrupamiento basado en restricciones para la detección de grupos que incluyan atributos de interés en conjuntos de datos de alta dimensionalidad

Proyecto de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Ingeniería en Computación, 2017.

Autor Principal:	Vallejos-Peña, Alonso
Otros Autores:	Calvo-Valverde, Luis Alexánder
Formato:	Tesis
Idioma:	Español
Publicado:	Instituto Tecnológico de Costa Rica 2018
Materias:	Minería de datos Algoritmos Datos Densidad Computación Research Subject Categories::TECHNOLOGY::Information technology::Computer science
Acceso en línea:	https://hdl.handle.net/2238/9374

id	RepoTEC9374
recordtype	dspace
spelling	RepoTEC93742023-05-04T15:17:17Z Propuesta de algoritmo que combina el agrupamiento en subespacios basado en densidad y el agrupamiento basado en restricciones para la detección de grupos que incluyan atributos de interés en conjuntos de datos de alta dimensionalidad Vallejos-Peña, Alonso Calvo-Valverde, Luis Alexánder Minería de datos Algoritmos Datos Densidad Computación Research Subject Categories::TECHNOLOGY::Information technology::Computer science Proyecto de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Ingeniería en Computación, 2017. Cluster analysis is one of data mining most common tasks, used frequently in finance, biology, medicine and market analysis problems [12]. High dimensional data poses a challenge to traditional clustering algorithms, where the similarity measures are not meaningful, affecting the quality of the groups. As a result, subspace clustering algorithms have been proposed as an alternative, aiming to find all groups in all spaces of the dataset [45]. By detecting groups on lower dimensional spaces, each group can belong to different subspaces of the original dataset [31]. Therefore, attributes the user may consider of interest can be excluded in some or all groups, decreasing the value of the result for the data analysts. Currently, the improvement of the results and the detection of more significant groups, is considered one of the biggest opportunity areas in the cluster analysis of high dimensional data, particularly, the capability to consider the relevance of attributes on the subspace pruning logic and the group detection is an open research area [30]. For this project, a new algorithm is proposed, that combines SUBCLU [1] and the constraint clustering algorithms [6] that allows the users to identify variables as attributes of interest based on prior domain knowledge, targeting to direct group detection towards spaces that include users attributes of interest, thereafter, generating more meaningful groups. Using this new algorithm (SUBCLU-R), an experiment was executed to compare the results from SUBCLU and SUBCLU-R. In this experiment, first, the average cohesion, separation and silhouette index was obtained for both algorithms by executing multiple tests in our dataset. Then, using a statistical hypothesis test we compared the obtained averages to find out if the observed differences were significant. Finally, a result analysis was performed, focused on comparing the performance of the proposed algorithm against the original SUBCLU. 6 The results indicate that it is possible to influence groupings towards those including attributes of interest, thanks to the inclusion of constrained clustering for subspace pruning. With this proposal, N-d detected subspaces (N is the total number of detected subspaces and d the number of attributes in the dataset) include the attribute of interest. After comparing both algorithm results, it was determined that SUBCLU-R detects a significantly higher percentage of groupings with the attribute of interest, while no significant statistical differences were found for the internal metrics of the groupings. 2018-02-09T14:15:38Z 2018-02-09T14:15:38Z 2017 info:eu-repo/semantics/masterThesis https://hdl.handle.net/2238/9374 spa application/pdf Instituto Tecnológico de Costa Rica
institution	Tecnológico de Costa Rica
collection	Repositorio TEC
language	Español
topic	Minería de datos Algoritmos Datos Densidad Computación Research Subject Categories::TECHNOLOGY::Information technology::Computer science
spellingShingle	Minería de datos Algoritmos Datos Densidad Computación Research Subject Categories::TECHNOLOGY::Information technology::Computer science Vallejos-Peña, Alonso Propuesta de algoritmo que combina el agrupamiento en subespacios basado en densidad y el agrupamiento basado en restricciones para la detección de grupos que incluyan atributos de interés en conjuntos de datos de alta dimensionalidad
description	Proyecto de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Ingeniería en Computación, 2017.
author2	Calvo-Valverde, Luis Alexánder
format	Tesis
author	Vallejos-Peña, Alonso
author_sort	Vallejos-Peña, Alonso
title	Propuesta de algoritmo que combina el agrupamiento en subespacios basado en densidad y el agrupamiento basado en restricciones para la detección de grupos que incluyan atributos de interés en conjuntos de datos de alta dimensionalidad
title_short	Propuesta de algoritmo que combina el agrupamiento en subespacios basado en densidad y el agrupamiento basado en restricciones para la detección de grupos que incluyan atributos de interés en conjuntos de datos de alta dimensionalidad
title_full	Propuesta de algoritmo que combina el agrupamiento en subespacios basado en densidad y el agrupamiento basado en restricciones para la detección de grupos que incluyan atributos de interés en conjuntos de datos de alta dimensionalidad
title_fullStr	Propuesta de algoritmo que combina el agrupamiento en subespacios basado en densidad y el agrupamiento basado en restricciones para la detección de grupos que incluyan atributos de interés en conjuntos de datos de alta dimensionalidad
title_full_unstemmed	Propuesta de algoritmo que combina el agrupamiento en subespacios basado en densidad y el agrupamiento basado en restricciones para la detección de grupos que incluyan atributos de interés en conjuntos de datos de alta dimensionalidad
title_sort	propuesta de algoritmo que combina el agrupamiento en subespacios basado en densidad y el agrupamiento basado en restricciones para la detección de grupos que incluyan atributos de interés en conjuntos de datos de alta dimensionalidad
publisher	Instituto Tecnológico de Costa Rica
publishDate	2018
url	https://hdl.handle.net/2238/9374
_version_	1796139300163682304
score	12.040382

Propuesta de algoritmo que combina el agrupamiento en subespacios basado en densidad y el agrupamiento basado en restricciones para la detección de grupos que incluyan atributos de interés en conjuntos de datos de alta dimensionalidad

Ejemplares similares