Dealing with Data Masking

Oct 27, 2021 News, Redgate

Engaging Data Explains :

Dealing with Data Masking


The Power BI Reporting Server is a powerful tool, and one of its most intuitive features is the ability to achieve true self-service business intelligence.

So in this blog post we’re going to walk you through the process involved with Self Service BI. You can use the free version of Power BI in order to achieve this, so there is no barrier to entry.

One of the best things about Power BI is that it keeps data masking simple. Sometimes clients have complicated or very specific requirements, so it’s always important to make the process as straightforward as possible.


Unusual Requirements

For example, one client we were working with had an unusual and interesting data masking requirement; they needed to mask their production data, but ensure that it was also human-readable, so that their development and testing team could create a new client portal system. This would have been complicated enough in itself, but their existing platform was also extremely complex, while there was little documentation available to help them understand the data storage process.

And there was another problem as well. The company had insufficient storage space to hold a second (in-line) database, with the quality of production grade, masked data that was needed.

As we’re experienced and accomplished in this field, we quickly identified several possible solutions to this scenario. But choosing the best one was critically important. After some assessment, we elected for Redgates Data Masker for the following reasons:

• After a review of the underlying data structure, we reflected that it was too difficult, costly and time-intensive to attempt to transfer the data into the test environment and apply masking rules.

• It was important to make a distinction between masking and obscuring the data. The client wanted human-readable values, so we had to ensure this quality was retained.

• There was a lack of documentation regarding the location of personal identifiable data, which could result in the process missing an important part of the system.

• We also had a requirement to include a verification process, comparing the masked data against the source. This report would then provide us with an insight into whether we had inadvertently masked any records.

We devised a simple plan to extract the data, load it into a SQL database, and then finally complete the masking process. This would allow the client’s development team to export the masked data and transfer it into the client portal.

Technology

Identifying the data was always going to be a tricky process if attempted manually, due to the core system’s conventions around the naming of tables and columns. So to address this, we used WhereScape’s 3D product, which documented the structure of the system into a metadata layer. Our consultant worked closely with the business teams to update the metadata layer, highlighting the fields that contained personally identifiable data, while also adding business definitions.

We also took the opportunity to agree the type of data masking that was needed within the field. The most challenging aspect of this was understanding how the data joined or reused in different tables. But the client provided all of the known diagrams and suggested relationships, which significantly shortened the investigation time involved.

At the end of this exercise, our client also produced detailed documents of the core systems data structure, as well as analysis of the data cardinality/profiles. This uncovered some interesting points about the system, including some aspects of it that held personally identifiable data of which the client was unaware.

Using the information within the metadata, the physical structure of the system was imported into WhereScape’s Red product, which automated the extraction of data and loaded the data into a SQL database on a scheduled basis. We started off this process gently, working on a daily schedule, but as we became more certain about the process, we increased this duration to hourly.

Now that the data was present and optimised within the SQL database, we next used Redgate’s Data Masker to convert the personally identifiable data to a dataset, based on the agreed rules held within the metadata. Once the rules had been designed, WhereScape Red’s scheduler automated the masking, so that it began as soon as the loading had been completed.

What could have been a hugely complicated and onerous process was made far simpler. The whole database was copied, masked and sent to the client portal within four hours.

Measuring the Process

As some of the data was being sent to a third party, it was very important that there was never any risk of a data breach. But we had no problem in building a methodology to address this. Using WhereScape Red, Engaging Data was able to build a comparison process. This utilised the metadata, using only those fields marked as containing personally identifiable data. This also made it possible to compare the values before and after the process had taken place.

Finally, the comparison report was automatically emailed to the management team, regardless of whether or not a failure was triggered. This email contained a summary of field error analysis, as well as the number of field errors per record. The latter was used to assess the overall process and prevent any sensitive data from being distributed to third parties. By automating this, we were able to reassure the client that the whole process was working correctly.

Conclusion

It’s quite common for Engaging Data to encounter complicated situations with a wide range of clients. Each of the following are common problems or requirements:

• Ageing trading platforms/core systems, or sources of data that can’t utilise off-the-shelf data masking products.

• Companies need the data masked quickly, in virtual real-time speed.

• Different types of masking are commonly needed, whether obscured, human-readable or randomised,

• There are limitations on storage or infrastructure.

The best data masking tools will address these issues and automate the process, allowing the

client to decide how, when and where to mask. Our expertise and experience in this area has enabled us to achieve some excellent results with some highly complex datasets and requirements.


If you would like to learn more about this app or how we can help with your data project, please feel free to contact us.


Contact Us

* indicates required