Web scraping by end users

cic.institucionOrigenLaboratorio de Investigación y Formación en Informática Avanzada (LIFIA)
cic.isFulltextSI
cic.isPeerReviewedSI
cic.lugarDesarrolloLaboratorio de Investigación y Formación en Informática Avanzada (LIFIA)
cic.parentTypeArtículo
cic.versionAceptada
dc.date.accessioned2025-12-01T16:37:09Z
dc.date.available2025-12-01T16:37:09Z
dc.identifier.urihttps://digital.cic.gba.gob.ar/handle/11746/12582
dc.titleWeb scraping by end usersen
dc.typeArtículo
dcterms.abstractScraping is a topic studied from various perspectives, encompassing automatic and AI-based approaches, and a wide range of programming libraries that expedite development. As the volume of available web content increases, it becomes increasingly challenging to anticipate end-user requirements regarding what, how, and when to extract data from the web. This challenge is compounded when integrating data from multiple websites, particularly when websites’ search engines dynamically retrieve unavailable data via permanent links. Complex scraping processes, such as these are difficult to develop using generalpurpose programming languages and are challenging to automate with AI-based approaches. Controllability is a crucial aspect of scraping, that is, how end users can make decisions during the scraper specification process, understand information sources, and how the data are ultimately extracted, compiled, and formatted for output. In response, our study presents an innovative end-user approach for specifying scrapers that focuses on seamlessly integrating data from multiple sources. Through this approach and its supporting toolset, we aim to provide users with greater control and transparency over the extraction, integration, and formatting of data, thereby addressing the key concerns in web scraping. The approach and toolset were evaluated and they yielded promising results.en
dcterms.creator.authorTacuri, Alex
dcterms.creator.authorFirmenich, Sergio
dcterms.creator.authorFernández, Alejandro
dcterms.creator.authorRiva, María Florencia
dcterms.creator.authorUrbieta, Matías
dcterms.creator.authorRossi, Gustavo Héctor
dcterms.identifier.otherISSN: 2169-3536
dcterms.identifier.otherDOI: 10.1109/access.2025.3636662
dcterms.isPartOf.issue2025
dcterms.isPartOf.seriesIEEE Access
dcterms.issued2025-11-25
dcterms.languageInglés
dcterms.licenseAttribution 4.0 International (BY 4.0)
dcterms.subjectWeb miningen
dcterms.subjectEnd-user computingen
dcterms.subjectHuman computer interactionen
dcterms.subjectUser centered designen
dcterms.subjectWeb scrapingen
dcterms.subjectData integrationen
dcterms.subjectScraper specificationen
dcterms.subjectWeb data extractionen
dcterms.subject.materiaCiencias de la Computación e Información

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
Web_scraping_by_end_users.pdf-PDFA.pdf
Tamaño:
13.45 MB
Formato:
Adobe Portable Document Format
Descripción:
Documento completo

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
3.46 KB
Formato:
Item-specific license agreed upon to submission
Descripción: