Wednesday, November 14, 2012

How important are metadata and data dictionaries?

How important are metadata and data dictionaries?


In data warehousing and data storage, metadata and the use of a data dictionary is extremely important.  Metadata in layman’s terms is basically data about data.  It explains how the data was created, when it was created and the type of data it is.  Staudt, Vaduva & Vetterli (n.d.) state in their paper that when working with the complexity of building, using and maintaining a data warehouse, metadata is indispensible, because it is used by other components or even directly by humans to achieve particular tasks (Staudt, Vaduva & Vetterli, n.d.).  
Metadata can be used in three different ways:
  • Passively – documents the structure, development process and use of the data warehouse system.  (Staudt, Vaduva & Vetterli, n.d.)
  • Actively – Used in data warehouse processes that are “metadata driven.”  (Staudt, Vaduva & Vetterli, n.d.)
  • Semi-actively – Stored as static information to be read by other software components.  (Staudt, Vaduva & Vetterli, n.d.)
So as you can see, the use of metadata is important, not only to store information about the data, but it is also used in processes by the data warehouse and by other applications.  Metadata also improves on data quality by providing consistency, completeness, accuracy, timeliness and precision.  This is because it provides information on the creation time, and author of the data, the source and the meaning of the data when it was created. (Staudt, Vaduva & Vetterli, n.d.).  

Regarding the data dictionary, this reminds me of when I would query the database at a past position of mine.  Because there was no data dictionary, it was hard to manually decipher the relevance of the data that I was searching for and where it was stored.  Because of this, I did not always bring back the correct fields necessary to complete my work and this caused devalued use of time.  On AHIMA’s website Clark, Demster & Solberg (2012) prepared an article about the use of a data dictionaries and how they can be used to improve data quality. 
  • Avoid inconsistent naming conventions
  • Avoid inconsistent definitions
  • Avoid varying lengths of fields
  • Avoid varied element values (Clark, Demster & Solberg, 2012)
By using the data dictionary, there is a consistency created in the data, which in turn improves data quality.

In conclusion, both metadata and data dictionaries are vital to creating consistent data.  This data can be tracked and can be used to create interoperable processes between the data warehouse and other applications.  Without these, architects are taking a chance and increasing their opportunities for use of less quality data.

References:

AHIMA. (2012 January).  Managing a Data Dictionary. Journal of AHIMA 83(1),pp. 48-52.  Retrieved from http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_049331.hcsp?dDocName=bok1_049331

Staudt, M., Vaduva, A. & Vetterli, T.  (n.d.).  The Role of Metadata for Data Warehousing.  Retrieved from http://www.informatik.uni-jena.de/dbis/lehre/ss2005/sem_dwh/lit/SVV99.pdf

No comments:

Post a Comment