Gartner data quality metrics increase data quality by 60% to significantly reduce operational risks and costs
© S10 group 2020
Data Fabric logo

Data quality metrics and high-quality data

Data is the new oil, but data brings a lot of challenges with it, prior to being able to get control over and information out of your valuable data. Some of these challenges could be: data might be unstructured data might be scattered all over the place and in multliple silos relevant and recent data is hard to find data might be processed differently by entities in other countries records are incomplete or even missing. These are just some of the data challenges that upstream users face when they want to use the data. To get information out of data, high-quality data is required. To obtain quality data, data preparation is required. Data preparation tasks take however a considerable amount of time and have to be done again and again. The solution to the above challenges and limitations is to centrally integrate all data, to make high-quality data from all this data and to stream this to upstream users for consumption. Once high-quality data is realised, data curation and data normalisation could be applied to the original data sources after approval by a data steward, data engineer or data owner. This is precisely what the Data Fabric does. To create this high-quality data , the Data Fabric uses 18 data quality metrics . “By 2022, 70% of organizations will rigorously track data quality levels via metrics, increasing data quality by 60% to significantly reduce operational risks and costs.” Melody Chein, Saul Judah, Ankush Jain - Gartner -

18 data quality metrics are applied

The 18 different data quality metrics tells you what quality data you are actually dealing with. Fixing Data Quality issues start with having an understanding regarding the data damage first. And this is what the 18 data quality metrics are about. With the Data Fabric, your integrated data will be analysed and ranked. By defining the quality level your data should have, groupwise and with Machine Learning support, you will be notified regarding the proposed adjustments to increase your data quality level. 1 . Data Accuracy Data accuracy is determined by the number of different data sources that are eluding to the same value of data for the same property. The values don’t have to be the same, but are eluding to the same value, for example +44 53 53 53 53 might be an accurate phone number for a person, but so is 53 53 53 53. They are physically different values but essentially both are accurate. 2 . Data Consistency Data consistency is very similar to the data metric data accuracy, but it is more based off data types. Example given, if a value starts out as a number, but suddenly is changed to a decimal value, then moved back to an integer then this would be a low consistency in data. 3 . Data Integrity Data integrity is similar to the data metrics data accuracy and data consistency but takes into consideration a temporal factor i.e. how often, time after time is does the Data Fabric reinforce that the value of a property is correct. 4 . Data Uniformity Data uniformity is determined very similar to the data metric data accuracy but is much stricter. Data uniformity is scored on how many different sources are eluding to the same value in the same format. The closer the format, the higher the uniformity. For example, if one record states that the industry of a company was “Software” and another record states “software” then this is very close uniformity, but not 100%. The more divergence in values, the lower in uniformity. 5 . Data Completeness Data completeness is determined by the presence of a value. Value should not be an empty string or Null. The Data Fabric treats values like Unknown, N/A, 0 as values. 6 . Data Relevance For determining data relevance, the Data Fabric needs to pin relevance on something. The Data Fabric pins it on the company, i.e. what is relevant for the company. The Data Fabric scores relevance on how many hops a record is away from the business. Example given, if you are an employee of a company, you are directly connected. If you are a contact of an employee of a company, then you are 2 hops away. The Data Fabric also couples this with the relevance of the actual metadata you have on those records. Example given, today, a fax number of a business is not relevant. Having their annual revenue, employee count and website is much more relevant. Anything that is 5 or more hops away from the business has a very low relevance. 7 . Data Stewardship Data stewardship is determined by how much manual cleaning, labeling, and curation has been done on a record. 8 . Data Timeliness Data timeliness are determined by time to value and delivery. The more real-time data is synced to the Data Fabric, the better the Timeliness. But also the consumers of the data will be taken into account. 9 . Data Accountability Data accountability is based on governance and ownership of the data. The more people are responsible for a record, the higher the accountability. Data acountability can be increased by making ensuring that products, data, governance owners of data, integrations or more are assigned. If these aren’t set, data accountability will be 0% because, at the end of the day, the Data Fabric can only point to the metadata of the Authors to assume accountability. 1 0 . Data Validity Data validity is deriving from data owners doing audits on the data. The more data audits that are done, the higher the validity. 1 1 . Data Connectivity Data connectivity is determined by the density of the data records. The Data Fabric establishes a fully connected network of all data. Similar to Google, Twitter, Facebook, and LinkedIn, the more “dense” a record is, the more “important” it is. The more data that is directly or indirectly connected to a record the higher the data connectivity percentage will be. 1 2 . Data Reliability Data reliability comes down to trust. Trust can come from a product owner being able to influence reliability. As such, the first influence of reliability is a static score of how reliable the source is. Example given, it is typically mandated that HR systems keep very high-quality data. When adding integration's to the Data Fabric, the expected reliability of the data source can be set. 1 3 . Data Conformity Data conformity is determined by the conformity of the value to what is expected from the world. Good examples would be that although 132 1 125 1 might be a phone number, it is not in a format that is recognised by any standards. 1 4 . Data Flexibility Data flexibility is determined by how much certain data is consumed by different consumers. If the same data is being used in a Data Warehouse, for Business Intelligence, Machine Learning and Process Analytics then the data flexibility will be high. 1 5 . Data Staleness Data staleness us determined by the rate of updates in records in respect to the accuracy. Example given, an e-mail or a telephone number. An e-mailaddress today, may be stale tomorrow. The Data Fabric determines how long your data has the wrong value for a property. 1 6 . Data Availability Data availability is determined by how often the data that you need is available for use. Are there problems with the data flow? Example given, data might be flowing in from multiple sources, and this data is pushed over to business intelligence tools. The moment that this data is not flowing, will result in a low availability for that particular moment. 1 7 . Data Usability Data usibility is determined by the number of consumers. Data that is constantly used by many consumers implies high usability of the data. 1 8 . Data Quality Data quality is an aggregation of all scores combined. This implies that there are some data metrics that could drastically bring down the average data quality. It is likely that the data quality score will be low when you have just integrated your data with the Data Fabric.
S10 jigsaw bullet
S10 jigsaw bullet
S10 jigsaw bullet
S10 jigsaw bullet
S10 jigsaw bullet

Data quality metrics in the Data Fabric

Data quality metrics Learn more Learn more
The Smart Data Fabric: “How does it work?”
+91 74 066 24 888
Secured by Sectigo
Create high-quality data Work with 18 different data quality metrics Data curation & normalisation at the source
All about data and innovation
+31 (0) 252 225 466
Improving data quality
Learn more Learn more
The Smart Data Fabric: “How does it work?”
All about data and innovation
© S10 group 2020

Data quality metrics and high-quality data

Data is the new oil, but data brings a lot of challenges with it, prior to being able to get control over and information out of your valuable data. Some of these challenges could be: data might be unstructured data might be scattered all over the place and in multliple silos relevant and recent data is hard to find data might be processed differently by entities in other countries records are incomplete or even missing. These are just some of the data challenges that upstream users face when they want to use the data. To get information out of data, high-quality data is required. To obtain quality data, data preparation is required. Data preparation tasks take however a considerable amount of time and have to be done again and again. The solution to the above challenges and limitations is to centrally integrate all data, to make high-quality data from all this data and to stream this to upstream users for consumption. Once high-quality data is realised, data curation and data normalisation could be applied to the original data sources after approval by a data steward, data engineer or data owner. This is precisely what the Data Fabric does. To create this high-quality data , the Data Fabric uses 18 data quality metrics . “By 2022, 70% of organizations will rigorously track data quality levels via metrics, increasing data quality by 60% to significantly reduce operational risks and costs.” Melody Chein, Saul Judah, Ankush Jain - Gartner -

18 data quality metrics are applied

The 18 different data quality metrics tells you what quality data you are actually dealing with. Fixing Data Quality issues start with having an understanding regarding the data damage first. And this is what the 18 data quality metrics are about. With the Data Fabric, your integrated data will be analysed and ranked. By defining the quality level your data should have, groupwise and with Machine Learning support, you will be notified regarding the proposed adjustments to increase your data quality level. 1 . Data Accuracy Data accuracy is determined by the number of different data sources that are eluding to the same value of data for the same property. The values don’t have to be the same, but are eluding to the same value, for example +44 53 53 53 53 might be an accurate phone number for a person, but so is 53 53 53 53. They are physically different values but essentially both are accurate. 2 . Data Consistency Data consistency is very similar to the data metric data accuracy, but it is more based off data types. Example given, if a value starts out as a number, but suddenly is changed to a decimal value, then moved back to an integer then this would be a low consistency in data. 3 . Data Integrity Data integrity is similar to the data metrics data accuracy and data consistency but takes into consideration a temporal factor i.e. how often, time after time is does the Data Fabric reinforce that the value of a property is correct. 4 . Data Uniformity Data uniformity is determined very similar to the data metric data accuracy but is much stricter. Data uniformity is scored on how many different sources are eluding to the same value in the same format. The closer the format, the higher the uniformity. For example, if one record states that the industry of a company was “Software” and another record states “software” then this is very close uniformity, but not 100%. The more divergence in values, the lower in uniformity. 5 . Data Completeness Data completeness is determined by the presence of a value. Value should not be an empty string or Null. The Data Fabric treats values like Unknown, N/A, 0 as values. 6 . Data Relevance For determining data relevance, the Data Fabric needs to pin relevance on something. The Data Fabric pins it on the company, i.e. what is relevant for the company. The Data Fabric scores relevance on how many hops a record is away from the business. Example given, if you are an employee of a company, you are directly connected. If you are a contact of an employee of a company, then you are 2 hops away. The Data Fabric also couples this with the relevance of the actual metadata you have on those records. Example given, today, a fax number of a business is not relevant. Having their annual revenue, employee count and website is much more relevant. Anything that is 5 or more hops away from the business has a very low relevance. 7 . Data Stewardship Data stewardship is determined by how much manual cleaning, labeling, and curation has been done on a record. 8 . Data Timeliness Data timeliness are determined by time to value and delivery. The more real-time data is synced to the Data Fabric, the better the Timeliness. But also the consumers of the data will be taken into account. 9 . Data Accountability Data accountability is based on governance and ownership of the data. The more people are responsible for a record, the higher the accountability. Data acountability can be increased by making ensuring that products, data, governance owners of data, integrations or more are assigned. If these aren’t set, data accountability will be 0% because, at the end of the day, the Data Fabric can only point to the metadata of the Authors to assume accountability. 1 0 . Data Validity Data validity is deriving from data owners doing audits on the data. The more data audits that are done, the higher the validity. 1 1 . Data Connectivity Data connectivity is determined by the density of the data records. The Data Fabric establishes a fully connected network of all data. Similar to Google, Twitter, Facebook, and LinkedIn, the more “dense” a record is, the more “important” it is. The more data that is directly or indirectly connected to a record the higher the data connectivity percentage will be. 1 2 . Data Reliability Data reliability comes down to trust. Trust can come from a product owner being able to influence reliability. As such, the first influence of reliability is a static score of how reliable the source is. Example given, it is typically mandated that HR systems keep very high-quality data. When adding integration's to the Data Fabric, the expected reliability of the data source can be set. 1 3 . Data Conformity Data conformity is determined by the conformity of the value to what is expected from the world. Good examples would be that although 132 1 125 1 might be a phone number, it is not in a format that is recognised by any standards. 1 4 . Data Flexibility Data flexibility is determined by how much certain data is consumed by different consumers. If the same data is being used in a Data Warehouse, for Business Intelligence, Machine Learning and Process Analytics then the data flexibility will be high. 1 5 . Data Staleness Data staleness us determined by the rate of updates in records in respect to the accuracy. Example given, an e- mail or a telephone number. An e-mailaddress today, may be stale tomorrow. The Data Fabric determines how long your data has the wrong value for a property. 1 6 . Data Availability Data availability is determined by how often the data that you need is available for use. Are there problems with the data flow? Example given, data might be flowing in from multiple sources, and this data is pushed over to business intelligence tools. The moment that this data is not flowing, will result in a low availability for that particular moment. 1 7 . Data Usability Data usibility is determined by the number of consumers. Data that is constantly used by many consumers implies high usability of the data. 1 8 . Data Quality Data quality is an aggregation of all scores combined. This implies that there are some data metrics that could drastically bring down the average data quality. It is likely that the data quality score will be low when you have just integrated your data with the Data Fabric.
S10 jigsaw bullet
S10 jigsaw bullet
S10 jigsaw bullet
S10 jigsaw bullet
S10 jigsaw bullet

Data quality metrics in the Data Fabric

Data quality metrics
Create high-quality data Work with 18 different data quality metrics Data curation & normalisation at the source
Data Fabric logo
Secured by Sectigo
Improving data quality
+91 74 066 24 888