Dealing with Data Masking

Dealing with Data Masking

Engaging Data Explains :

Dealing with Data Masking


The Power BI Reporting Server is a powerful tool, and one of its most intuitive features is the ability to achieve true self-service business intelligence.

So in this blog post we’re going to walk you through the process involved with Self Service BI. You can use the free version of Power BI in order to achieve this, so there is no barrier to entry.

One of the best things about Power BI is that it keeps data masking simple. Sometimes clients have complicated or very specific requirements, so it’s always important to make the process as straightforward as possible.


Unusual Requirements

For example, one client we were working with had an unusual and interesting data masking requirement; they needed to mask their production data, but ensure that it was also human-readable, so that their development and testing team could create a new client portal system. This would have been complicated enough in itself, but their existing platform was also extremely complex, while there was little documentation available to help them understand the data storage process.

And there was another problem as well. The company had insufficient storage space to hold a second (in-line) database, with the quality of production grade, masked data that was needed.

As we’re experienced and accomplished in this field, we quickly identified several possible solutions to this scenario. But choosing the best one was critically important. After some assessment, we elected for Redgates Data Masker for the following reasons:

• After a review of the underlying data structure, we reflected that it was too difficult, costly and time-intensive to attempt to transfer the data into the test environment and apply masking rules.

• It was important to make a distinction between masking and obscuring the data. The client wanted human-readable values, so we had to ensure this quality was retained.

• There was a lack of documentation regarding the location of personal identifiable data, which could result in the process missing an important part of the system.

• We also had a requirement to include a verification process, comparing the masked data against the source. This report would then provide us with an insight into whether we had inadvertently masked any records.

We devised a simple plan to extract the data, load it into a SQL database, and then finally complete the masking process. This would allow the client’s development team to export the masked data and transfer it into the client portal.

Technology

Identifying the data was always going to be a tricky process if attempted manually, due to the core system’s conventions around the naming of tables and columns. So to address this, we used WhereScape’s 3D product, which documented the structure of the system into a metadata layer. Our consultant worked closely with the business teams to update the metadata layer, highlighting the fields that contained personally identifiable data, while also adding business definitions.

We also took the opportunity to agree the type of data masking that was needed within the field. The most challenging aspect of this was understanding how the data joined or reused in different tables. But the client provided all of the known diagrams and suggested relationships, which significantly shortened the investigation time involved.

At the end of this exercise, our client also produced detailed documents of the core systems data structure, as well as analysis of the data cardinality/profiles. This uncovered some interesting points about the system, including some aspects of it that held personally identifiable data of which the client was unaware.

Using the information within the metadata, the physical structure of the system was imported into WhereScape’s Red product, which automated the extraction of data and loaded the data into a SQL database on a scheduled basis. We started off this process gently, working on a daily schedule, but as we became more certain about the process, we increased this duration to hourly.

Now that the data was present and optimised within the SQL database, we next used Redgate’s Data Masker to convert the personally identifiable data to a dataset, based on the agreed rules held within the metadata. Once the rules had been designed, WhereScape Red’s scheduler automated the masking, so that it began as soon as the loading had been completed.

What could have been a hugely complicated and onerous process was made far simpler. The whole database was copied, masked and sent to the client portal within four hours.

Measuring the Process

As some of the data was being sent to a third party, it was very important that there was never any risk of a data breach. But we had no problem in building a methodology to address this. Using WhereScape Red, Engaging Data was able to build a comparison process. This utilised the metadata, using only those fields marked as containing personally identifiable data. This also made it possible to compare the values before and after the process had taken place.

Finally, the comparison report was automatically emailed to the management team, regardless of whether or not a failure was triggered. This email contained a summary of field error analysis, as well as the number of field errors per record. The latter was used to assess the overall process and prevent any sensitive data from being distributed to third parties. By automating this, we were able to reassure the client that the whole process was working correctly.

Conclusion

It’s quite common for Engaging Data to encounter complicated situations with a wide range of clients. Each of the following are common problems or requirements:

• Ageing trading platforms/core systems, or sources of data that can’t utilise off-the-shelf data masking products.

• Companies need the data masked quickly, in virtual real-time speed.

• Different types of masking are commonly needed, whether obscured, human-readable or randomised,

• There are limitations on storage or infrastructure.

The best data masking tools will address these issues and automate the process, allowing the

client to decide how, when and where to mask. Our expertise and experience in this area has enabled us to achieve some excellent results with some highly complex datasets and requirements.


If you would like to learn more about this app or how we can help with your data project, please feel free to contact us.



Data Masking

Data Masking

Engaging Data Explains:

Data Masking


The Data Masking Challenge 

One of our clients had an interesting data masking requirement. How to mask Production data to meet with GDPR and IT security policies. The data needed to be human readable enabling the development and testing teams to create a data feed for a new Client Portal system. However, the core system did not have the ability to mask the data, only scramble or obfuscate. The core system was extremely complex, built & expanded on over 10 years. It is difficult to understand the system & how data is stored because documentation didn’t exist!

Furthermore, the architecture restraints meant there was not enough storage space to hold a second (in-line) database with masked production data.

Is This A Common Problem?

The more companies we speak to, the more complex or complicated situations we find. From our experience, we’ve found a pattern emerging in the common problems or requirements:

  • Old Tech – Ageing trading platforms/core systems or sources of data often don’t have the functionality to masked data. Those that do or have extensions/plug in to mask the data often take a long time to process or do not have the flexibility to fit every scenario.
  • Quick turnaround – Near realtime data is nice to have, but not always a real requirement.
  • Specific/varied masking – Different types of masking needed, obfuscation, scrambled, encrypted or human readable & randomised.
  • Storage – Limitations on storage or infrastructure makes it difficult to store an entire copy of production. 
  • Cost – Large database providers offer alternative tools with the same effect but also command a very large price tag.
  • Time – Developers can develop hand-cranked specific solutions which take reasonable amounts of time to develop but much longer to test to ensure the solution is working as expected.
  • Doing the right thing – Most clients want to do the right thing to meet regulatory requirements but see this as a complicated housekeeping chore and recognize the risk but choose to ignore it.

Engaging Data Discovery

We had a lot of options to solve this problem, but selected Redgate Data Masker and here is why:

  • After a review of the underlying data structure, it was too difficult, costly & time intensive to try to transfer the data into the Test environment and apply masking rules.  
  • We discovered that it would take 32 to 48 hours to copy the “majority” of the data from Production to UAT environments. Doing this would copy most but not all of the data creating a potential for leaving things behind. Plus it would take more time to run the system’s own obfuscation processes (another 8 hours).
  • Masking not Obscuring. Create human-readable values. i.e. Mr. Smith converts to Mr. Jones. This was not available from the trading platform’s masking function.
  • Defined values. Create predictable values, such as a telephone number set format or date of birth.
  • There was a lack of documentation regarding the location of personally identifiable data. This could result in the process missing part of the system if we processed the whole database.
  • We had a requirement to build in a verification process, comparing the masked data against the source. This report would answer the question – “have we missed masking any records?”

We created a simple plan to extract the data, load into a SQL database and then mask. Only taking required data increased efficient use of storage and reduced processing time. This would allow the Client’s development team to export the masked data and transfer into the Client Portal. 

Choosing The Right Tool

Identifying the data was a difficult manual process because of the core system’s table/column naming convention. Engaging Data’s Consultant used the WhereScape 3D product, which documented the structure of the system into a metadata layer. The consultant worked with the business teams to update the metadata layer & highlight fields that contained personally identifiable data. In addition, we added business definitions. Using an agile approach, each columns type of data masking requirement was agreed, along with how data joined and stored/reused in different tables. Helpfully, WhereScape 3D provided all the known diagrams and suggested relationships, helping to reduce the investigation time.

At the end of this exercise, WhereScape 3D produced detailed documents of the core systems data structure as well as analysis of the data cardinality/profiles. It uncovered some interesting points about the system, including some parts of the system that held personally identifiable data, that the client had not known existed.

Putting The Data Masking Solution Together

Using the information within the metadata; WhereScape’s Red imported the physical structure of the system and automating the extraction of data into a SQL database on a scheduled basis. We started off daily, but later to increase to every hour.

Now that the data was at rest in the SQL database, our consultant used Redgate’s Data Masker to convert the personally identifiable data to a data set, based on the agreed rules held within the metadata. Once the rules had been designed, WhereScape’s Red scheduler automated the masking so that it started as soon as the loading has completed. 

Data processing, including masking and being loaded into the target database, took place within 4 hours (initially). Not too onerous and very timely compared to other options. More importantly, meant we reduced processing time by a further hour.

Did The Data Masking Work?

Using WhereScape Red, the Engaging Data consultant was able to build a comparison process, that utilised the metadata (only using those field marked as containing personally identifiable data) and compare the values before and after the process. 

The processed ends with an automatic email of the data masking comparison report. This report contains a summary of field error analysis as well as a number of field errors per record. The latter was used to fail the process & prevent the data from be transferred to the target database. Automating this, enabled the Client to feel confident that the process was working correctly.

In Conclusion

All sorts of tools can be used to mask data. We find the best of them will automate the process allowing you to decide how to mask, when to mask & how frequent to do it.  


If you would like to learn more about this Redgate‘s Data Masker, WhereScape Red or how we can help with your data project, please feel free to contact office@engagingdata.co.uk