Banner Portal
The scientific production on data quality in big data: a study in the Web of Science database
PORTUGUES (Português (Brasil))
INGLES (Português (Brasil))
XML (Português (Brasil))

Keywords

Data quality. Big data. Quality Management. Web of Science

How to Cite

FAGUNDES, Priscila Basto; MACEDO, Douglas Dyllon Jeronimo de; FREUND, Gislaine Parra. The scientific production on data quality in big data: a study in the Web of Science database. RDBCI: Digital Journal of Library and Information Science, Campinas, SP, v. 16, n. 1, p. 194–210, 2017. DOI: 10.20396/rdbci.v16i1.8650412. Disponível em: https://periodicos.sbu.unicamp.br/ojs/index.php/rdbci/article/view/8650412. Acesso em: 16 aug. 2024.

Abstract

More and more, the big data theme has attracted interest in researchers from different areas of knowledge, among them information scientists who need to understand their concepts and applications in order to contribute with new proposals for the management of the information generated from the data stored in these environments. The objective of this article is to present a survey of publications about data quality in big data in the Web of Science database until the year 2016. Will be presented the total number of publications indexed in the database, the number of publications per year, the location the origin of the research and a synthesis of the studies found. The survey in the database was conducted in July 2017 and resulted in a total of 23 publications. In order to make it possible to present a summary of the publications in this article, searches were made of the full texts of all the publications on the Internet and read the ones that were available. With this survey it was possible to conclude that the studies on data quality in big data had their publications starting in 2013, most of which present literature reviews and few effective proposals for the monitoring and management of data quality in environments with large volumes of data. Therefore, it is intended with this survey to contribute and foster new research on the context of data quality in big data environments.

https://doi.org/10.20396/rdbci.v16i1.8650412
PORTUGUES (Português (Brasil))
INGLES (Português (Brasil))
XML (Português (Brasil))

References

BATINI, Carlo; SCANNAPIECA, Monica. Data quality: concepts, methodologies and techniques. New York. Springer, 2006

BATINI, Carlo et al. Methodologies for Data Quality Assessment and Improvement. ACM Computing Surveys, n.3, v.41, 2009, p. 1-52. Disponível em: < http://dl.acm.org/citation.cfm?id=1541883>. Acesso em: 25 mai. 2017.

BATINI, Carlo. et al. From Data Quality to Big Data Quality. Journal of Database Management, v. 26, n. 1, 2015, p. 60–82. Disponível em: < https://www.researchgate.net/publication/283681085_From_Data_Quality_to_Big_Data_Quality>. Acesso em: 7 jul. 2017.

BECKER, David; MCMULLEN, Bill; KING, Trish Dunn. Big Data, Big Data Quality Problem. In: IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, Santa Clara. Anais eletrônicos... Santa Clara: 2015. p.2644-2653 Disponível em: < http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7364064>. Acesso em: 7 jul. 2017.

CAI, Li; ZHU, Yangyong. The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Science Journal, v. 14, n. 0, 2015, p. 2. Disponível em: < http://datascience.codata.org/article/10.5334/dsj-2015-002/>. Acesso em: 15 jun. 2017.

CIANCARINI, Paolo; POGGI, Francesco; RUSSO, Daniel. Big Data Quality: a Roadmap for Open Data. 2ND IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS, 2., 2016, Oxford. Anais eletrônicos… Praga: 2016. Disponível em: < http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7474375 >. Acesso em: 7 jul. 2017.

ENDLER, Gregor; BAUMGAERTEL, Philipp; LENZ, Richard. Pay-as-you-go data quality improvement for medical centers. In: CONFERENCE ON EHEALTH - HEALTH INFORMATICS MEETS EHEALTH, 2013, Vienna. Anais eletrônicos… Vienna: 2013. p.13-18. Disponível em: < http://www.ehealth20xx.at/wp-content/uploads/scientific-papers/2013/endler.pdf>. Acesso em: 7 jul. 2017.

ERL, Thomas; KHATTAK, Wajid; BUHLER, Paul. Big data fundamentals: concepts, drivers & techniques. Boston: Prentice Hall, 2016.

FREITAS, Patrícia Alves de et al. Information Governance, Big Data and Data Quality. In: IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 16., 2013, Sydney. Anais eletrônicos… Sydney: 2013. p.1142-1143. Disponível em: < http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6755349>. Acesso em 07 jul. 2017.

FURLAN, Patricia Kuzmenko; LAURINDO, Fernando José Barbin. Agrupamentos epistemológicos de artigos publicados sobre big data analytics. Transinformação, v. 29, n. 1, 2017, p. 91-100. Disponível em: < http://www.scielo.br/pdf/tinf/v29n1/0103-3786-tinf-29-01-00091.pdf>. Acesso em: 21 abr. 2017.

GANAPATHI, Archana; CHEN, Yanpei. Data Quality: Experiences and Lessons from Operationalizing Big Data. 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 4., 2016, Washington. Anais eletrônicos… Washington: 2016. Disponível em: < http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7840769>. Acesso em: 7 jul. 2017.

GANDOMI, Amir; HAIDER, Murtaza. Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, v. 35, n. 2, 2015, p. 137–144. Disponível em: < http://dx.doi.org/10.1016/j.ijinfomgt.2014.10.007>. Acesso em: 21 abr. 2017.

HARYADI, Adiska Fardani et al. Antecedents of Big Data Quality An Empirical Examination in Financial Service Organizations. 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 4., 2016, Washington. Anais eletrônicos… Washington: 2016. Disponível em: < http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7840595>. Acesso em: 7 jul. 2017.

HAZEN, Benjamin T. et al. Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. International Journal of Production Economics, v. 154, 2014, p. 72–80. Disponível em: < http://www-sciencedirect-com.ez46.periodicos.capes.gov.br/science/article/pii/S0925527314001339?via%3Dihub>. Acesso em: 7 jul. 2017.

JUDDOO, Suraj. Overview of data quality challenges in the context of Big Data. In: INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS), 2015, Pamplemousses. Anais eletrônicos… Pamplemousses : 2015. Disponível em: < http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7374131>. Acesso em: 7 jul. 2017.

KAISLER, Stephen et al. Big Data: Issues and Challenges Moving Forward. In: XLVI HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 46., Maui, 2013. Anais eletrônicos... Maui, 2013. p.995-1004. Disponível em: < https://www.computer.org/csdl/proceedings/hicss/2013/4892/00/4892a995.pdf>. Acesso em: 22 abr. 2017.

KELLING, Steve et al. Taking a `Big Data’ approach to data quality in a citizen science project. AMBIO, v. 44, n. 4, 2015, p. S601–S611. Disponível em: <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4623867/>. Acesso em: 7 jul. 2017.

KWON, Ohbyung; LEE, Namyeon; SHIN, Bongsik. Data quality management, data usage experience and acquisition intention of big data analytics. International Journal of Information Management, v. 34, n. 3, 2014, p. 387–394. Disponível em: < http://www-sciencedirect-com.ez46.periodicos.capes.gov.br/science/article/pii/S0268401214000127?via%3Dihub>. Acesso em: 7 jul. 2017.

LANEY, Doug. Application Delivery Strategies. META Group, 2001. Disponível em: < https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf>. Acesso em: 7 jul. 2017.

MCAFEE, Andrew; BRYNJOLFSSON, Erik. Big Data. The management revolution. Harvard Buiness Review, v. 90, n. 10, 2012 p. 61–68. Disponível em: < https://hbr.org/2012/10/big-data-the-management-revolution>. Acesso em: 22 abr. 2017.

MERINO, Jorge et al. A Data Quality in Use model for Big Data. Future Generation Computer Systems, v. 63, 2016, p.123-130. Disponível em: < http://www.sciencedirect.com/science/article/pii/S0167739X15003817/>. Acesso em: 07 jul. 2017.

PAIM, Isis; NEHMY, Rosa Maria Quadros, GUIMARÃES, César Geraldo. Problematização do conceito "Qualidade" da Informação. Perspectivas em Ciência da Informação, v. 1, n. 1, 1996, p. 111–119. Disponível em < http://portaldeperiodicos.eci.ufmg.br/index.php/pci/article/view/8/27>. Acesso em: 30 mar. 2017.

PORTAL DE PERIÓDICOS DA CAPES/MEC. Disponível em: < http://www.periodicos.capes.gov.br/?option=com_pcollection&mn=70&smn=79&cid=81>. Acesso em: 07 jun. 2017.

RAO, Dhana; GUDIVADA, Venkat N.; RAGHAVAN, Vijay V. Data Quality Issues in Big Data. In: IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, Santa Clara. Anais eletrônicos... Santa Clara: 2015. Disponível em: < http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7364065>. Acesso em: 7 jul. 2017.

RIBEIRO, Claudio José Silva. Big Data: os novos desafios para o profissional da informação. Informação & Tecnologia, v. 1, n. 1, 2014, p. 96–105. Disponível em: < http://periodicos.ufpb.br/index.php/itec/article/view/19380/11156>. Acesso em: 19 abr. 2017.

SADIQ, Shazia; PAPOTTI, Paolo. Big Data Quality - Whose problem is it? 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 32., 2016, Helsinki. Anais eletrônicos… Helsinki: 2016. Disponível em: < http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7498367>. Acesso em: 07 jul. 2017.

SAHA, Barna; SRIVASTAVA, Divesh. Data Quality: The other Face of Big Data. In: IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 30., 2014, Chicago. Anais eletrônicos…Chicago: 2014. Disponível em: < https://people.cs.umass.edu/~barna/paper/ICDE-Tutorial-DQ.pdf>. Acesso em: 7 jul. 2017.

SOMASUNDARAM, G.; SHRIVASTAVA, Alok. Armazenamento e gerenciamento de informações: como armazenar, gerenciar e proteger informações digitais. Porto Alegre: Bookman. 2011. 472p.

TALEB, Ikbal et al. Big data quality: A quality dimensions evaluation. 13TH IEEE INT CONF ON UBIQUITOUS INTELLIGENCE AND COMP, 13., 2016, Toulouse. Anais eletrônicos… Toulouse: 2016. Disponível em: < http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7816918>. Acesso em: 7 jul. 2017.

VALENTE, Nelma T. Zubek; FUJINO, Asa. Atributos e dimensões de qualidade da informação nas Ciências Contábeis e na Ciência da Informação: um estudo comparativo. Perspectivas em Ciência da Informação, v. 21, n. 2, 2016, p. 141–167. Disponível em: < http://portaldeperiodicos.eci.ufmg.br/index.php/pci/article/view/2530/1761>. Acesso em: 16 mar. 2017.

VIANNA, William Barbosa; DUTRA, Moisés Lima; FRAZZON, Enzo Morosini. Big data e gestão da informação: modelagem do contexto decisional apoiado pela sistemografia. Informação & Informação, v. 21, n. 1, 2016, p. 185. Disponível em: < http://www.uel.br/revistas/uel/index.php/informacao/article/view/23327/18993>. Acesso em: 21 abr. 2017.

WANG, Richard Y.; STRONG, Diane M. Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information System, v.12, n.4, 1996, p.5-34. Disponível em: < http://mitiq.mit.edu/Documents/Publications/TDQMpub/14_Beyond_Accuracy.pdf>. Acesso em: 16 abr. 2017.

YANG, Wenlu; SILVA, Alzennyr Da; PICARD, Marie-Luce. Computing Data Quality Indicators On Big Data Streams Using A Cep. In: INTERNATIONAL WORKSHOP ON COMPUTATIONAL INTELLIGENCE FOR MULTIMEDIA UNDERSTANDING (IWCIM), 2015, Praga. Anais eletrônicos… Praga: 2015. Disponível em: < http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7347061>. Acesso em: 10 jul. 2017

ZIKOPOULOS, Paul. et al. Understanding big data: analytics for enterprise class hadoop and streaming data. New York: McGraw-Hill, 2012.

ZHU, Hongwei et al. Data and information quality research: its evolution and future. MIT: Cambridge, 2012. Disponível em: < http://web.mit.edu/smadnick/www/wp/2012-13.pdf>. Acesso em: 10 jul. 2017.


RDBCI: Revista Digital de Biblitoeconomia e Ciência da Informação /  Digital Journal of Library and Information Science uses the Creative Commons (CC) license, thus preserving the integrity of articles in an open access environment, in which:

  • This publication reserves the right to modify the original, regarding norms, spelling and grammar, in order to maintain the standards of the language, still respecting author writing style;
  • The original documents will not be returned to the authors;
  • Published works become Revista Digital de Biblitoeconomia e Ciência da Informação /  Digital Journal of Library and Information Science's property, their second partial or full print being subject to expressed authorization by RDBCI's editor;
  • The original source of publicaton must be provided at all times;
  • The authors are solely responsible fo the views expressed within the document.

Downloads

Download data is not yet available.