SMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Data

cic.isFulltexttruees
cic.isPeerReviewedtruees
cic.lugarDesarrolloInstituto de Investigación en Informáticaes
cic.versioninfo:eu-repo/semantics/publishedVersiones
dc.date.accessioned2018-10-04T19:04:10Z
dc.date.available2018-10-04T19:04:10Z
dc.identifier.urihttps://digital.cic.gba.gob.ar/handle/11746/8512
dc.titleSMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Dataen
dc.typeDocumento de conferenciaes
dcterms.abstractThe volume of data in today’s applications has meant a change in the way Machine Learning issues are addressed. Indeed, the Big Data scenario involves scalability constraints that can only be achieved through intelligent model design and the use of distributed technologies. In this context, solutions based on the Spark platform have established themselves as a de facto standard. In this contribution, we focus on a very important framework within Big Data Analytics, namely classification with imbalanced datasets. The main characteristic of this problem is that one of the classes is underrepresented, and therefore it is usually more complex to find a model that identifies it correctly. For this reason, it is common to apply preprocessing techniques such as oversampling to balance the distribution of examples in classes. In this work we present SMOTE-BD, fully scalable preprocessing approach for imbalanced classification in Big Data. It is based on one of the most widespread preprocessing solutions for imbalanced classification, namely the SMOTE algorithm, which creates new synthetic instances according to the neighborhood of each example of the minority class. Our novel development is made to be independent of the number of partitions or processes created to achieve a higher degree of efficiency. Experiments conducted on different standard and Big Data datasets show the quality of the proposed design and implementation.en
dcterms.creator.authorBasgall, María Josées
dcterms.creator.authorHasperué, Waldoes
dcterms.creator.authorNaiouf, Marceloes
dcterms.creator.authorFernández, Albertoes
dcterms.creator.authorHerrera, Franciscoes
dcterms.extentp. 23-28es
dcterms.identifier.otherhandle:10915/69676es
dcterms.isPartOf.issueVI Jornadas de Cloud Computing & Big Data (JCC&BD) (La Plata, 2018)es
dcterms.isPartOf.itemActas JCC&BD 2018es
dcterms.issued2018
dcterms.languageIngléses
dcterms.licenseAttribution-NonCommercial-ShareAlike 4.0 International (BY-NC-SA 4.0)es
dcterms.subjectbig data, imbalanced classification, preprocessing, SMOTE, sparken
dcterms.subject.materiaCiencias de la Computación e Informaciónes

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
04Basgall.pdf-PDFA.pdf
Tamaño:
903.82 KB
Formato:
Adobe Portable Document Format
Descripción:
Documento completo