Extract, transform and load architecture for metadata collection

De Giusti, Marisa Raquel; Lira, Ariel Jorge; Oviedo, Néstor

Documento de conferencia

Acceso Abierto

Extract, transform and load architecture for metadata collection

De Giusti, Marisa Raquel

|

Lira, Ariel Jorge

|

Oviedo, Néstor

Título alternativo

Arquitectura ETL para la recolección de metadatos

Fecha de publicación

17 de mayo de 2011

Lugar de desarrollo

Universidad Nacional de La Plata

Evento

Simposio Internacional de Bibliotecas Digitales (Brasil, 2011)

Nombre del evento

VI Simposio

Idioma

Inglés

Materia

Ciencias de la Computación e Información

HDL 11746/2223

Descargas

Ponencia (4 p.) (381.89 KB)

Presentación (28 diap.) (233.34 KB)

Enlace externo

Resumen

Digital repositories acting as resource aggregators typically face different challenges, roughly classified in three main categories: extraction, improvement and storage. The first category comprises issues related to dealing with different resource collection protocols: OAI-PMH, web-crawling, webservices, etc and their representation: XML, HTML, database tuples, unstructured documents, etc. The second category comprises information improvements based on controlled vocabularies, specific date formats, correction of malformed data, etc. Finally, the third category deals with the destination of downloaded resources: unification into a common database, sorting by certain criteria, etc. This paper proposes an ETL architecture for designing a software application that provides a comprehensive solution to challenges posed by a digital repository as resource aggregator. Design and implementation aspects considered during the development of this tool are described, focusing especially on architecture highlights.

Palabras clave

Búsqueda y recuperación de información

Aplicaciones de los Sistemas de Información

repositories

aggregation

harvesting

datawarehousing

data integration

Esta obra se publica con la licencia Creative Commons Attribution 4.0 International (BY 4.0)

Página completa del ítem

Extract, transform and load architecture for metadata collection

Título alternativo

Título de investigación

Directores

Compiladores

Editores

Editorial

Fecha de publicación

Descripción

Emisor del título

Lugar de desarrollo

Centro CIC

Libro/Informe

Recursos relacionados

Evento

Nombre del evento

Idioma

Materia

Area temática

Clasificación FORD

Cobertura Espacial

Extensión

Descargas

Enlace externo

Resumen

Palabras clave

item.page.license