Category: News

Big Data and DataVault

May 10, 2021 by Carl Richards

Knowing how and where to find the needle more easily, and where in the specific haystack it resides

Big Data has been a hot potato topic for more than a few years now, and this phenomenon will play a central role in the future of commerce. Collecting, collating and comprehending Big Data will no longer be a matter of commercial interest; it will instead increasingly become a commercial imperative.

It should come as no surprise then that investment in technologies related to Big Data is already becoming almost ubiquitous. A report from NewVantage Partners, which collected executive perspectives from 60 Fortune 1000 companies, found that 97% of them invest in Big Data and AI initiatives. NewVantage also discovered that the vast majority of this investment (84%) was focused on deploying advanced analytics capabilities to enable business decision making.

Big Understatement

And when we use the term ‘Big Data’, it’s reasonable to conclude that ‘big’ is an understatement! For example, in 2018, Internet users generate approximately 2.5 quintillion bytes of data every day. That’s 912 quintillion bytes every year! And 90% of this data has been generated in just the last five years. The rate of growth and development of this curve is exponential.

Thus, it’s one thing to recognise the importance of Big Data, and quite another to be prepared for it. We’re talking about a veritable avalanche of information! In many cases, utterly unstructured information. Indeed, Forbes noted in 2019 that 95% of businesses cite the need to manage unstructured data as a problem for their business. Which, given the sheer scale of Big Data, is hardly surprising. Making the most of Big Data is not so much searching for a needle in a haystack; more like looking for a needle in a universe entirely comprised of haystacks.

This reality means that implementing the best business intelligence solutions will become essential. Dealing with the sheer volume of Big Data will demand this. And data warehousing is one element of this process that will be critically important. The analytical qualities delivered by this aspect of the overall Big Data management process will prove critical in the success of the efforts of companies to benefit from the information explosion.

Data Vault 2.0

That’s where Data Vault comes in. Data Vault 2.0 comprises a raft of sophisticated architecture and techniques that enable businesses to both store current and historical data in a singular and easily accessible location, along with the ability to create analytics based on this information. Data Vault is effectively a unique design methodology for large scale data warehouse platforms, ensuring that Big Data is dealt with more quickly, more efficiently, and more effectively.

Data Vault offers several advantages over competitors. The first reason for this is that it’s possible to convert any system to Data Vault determinations. This means that existing objects can be translated to Data Vault entities, and every single item will have a corresponding match in the new Data Vault architecture. Every main definition can then be mapped by hubs and every relationship between these via links. This means that the whole operation is more flexible and user-friendly.

Another significant advantage of Data Vault is its enhancement of agility. This is particularly important, as the ability of network software and hardware to automatically control and configure itself makes it easier to deal with the almost unfathomable scope of Big Data.

Smaller Pieces

Data Vault makes it possible to divide a system into smaller pieces, with each individual component available for separate design and development. This means every constituent part of the system can have its own definitions and relationships and that these can be combined at a later date by related mapping. This makes it possible to develop a project steadily yet still see instant results. It also makes managing change requests much more straightforward.

Another asset of the Data Vault approach is that it applies to numerous different systems. This means that separate sources can be transformed into Data Vault entries without any laborious procedures being involved. It is particularly advantageous in the contemporary climate, as almost every enterprise system relies on several different data types from various data sources.

The Data Vault modelling technique is thus adaptable to all types of sources, with a minimum of fuss. This makes it much more feasible to link different data sources together, making analysis more joined-up and holistic. It is well-known that being the entity that is the most adaptable to change is vital across a wide variety of niches, and this applies in the rapidly evolving data analysis environment.

But possibly the most compelling reason to choose Data Vault is that our offering provides companies with a method of standardisation. With Data Vault implemented, companies can standardise their entire DWH system. This standardisation enables members of the company to understand the system more easily, which is undoubtedly advantageous considering the innate complexity of this field.

Meeting the Needs

It is commonplace for complex and sophisticated solutions to be delivered to business users, which nevertheless fail to understand and adapt to the company’s actual requirements in that area. Everyone wants to show off their fancy piece of kit, but often developers aren’t as keen to listen! This can manifest for a variety of reasons. Still, the important thing to note is that Data Vault is designed to meet the requirements of the business, rather than requiring a business to reorganise itself to comply with the needs of the package.

This is important at a time when the dynamic complexity associated with data is escalating. Enterprise data warehouse systems must provide accurate business intelligence and support a variety of requirements. This has become a critical reality in a business marketplace in which the sheer volume of data being generated is overwhelming.

Data Vault solves these problems with a design methodology that is ideal for large scale data warehouse platforms. With an approach that enables incremental delivery and a structure that supports regular evolution over time, Data Vault delivers a standard for data warehousing that elevates the whole industry.

Is the Secretary of State for Health revisiting ‘the Spine’ with the desire for a “consistent data platform which would see patient data separated from the application layer?”

Mar 24, 2021 by Ashley Sass

“One of my lessons from the COVID response is that the pulling together of data that previously had been in silos is absolutely critical.”

“Bringing together data that once upon a time, would have only existed in silos, was fundamental to the COVID response.” (Matt Hancock, Digital Health Festival, March 2021)

The infamously fated NHS Spine project of the early noughties aimed to provide an entire national cradle-to-grave medical history of patient engagement with the NHS in England and Wales for all citizens. The project failed for several reasons, including poor contract and supplier management, uncontrolled costs, conflicting agendas and unclear objectives. However, the more fundamental reason for this project’s failure was a lack of agreement on who owned the data (the patient, the trust, the NHS, the service providers? Information sharing could not be sanctioned).

The COVID pandemic has demonstrated the strategic importance of a unified understanding of all critical patient care elements across the entire NHS. Providing free movement and sharing data has driven the changes necessary to support initiatives from virtual hospitals, rapid vaccine development and deployment, and patient tracking.

The UK government’s perspective on the importance of data within NHS operations makes an interesting read.  The critical areas for improvement to drive the NHS forward were explored recently by Matt Hancock, the Secretary of State for Health and Social Care, in his speech to the Digital Health Rewired Festival. His perspective is summarised in an article published in Digital Health in March’ 21(see below). The importance of Data lies at its heart.

Broadly, the Minister addresses 5 key themes. These break down as:

  1. Digitise (the fundamentals of) the NHS: Ensure that everyone can participate and get the basics right.
  2. Connecting the Systems: Data needs to flow appropriately and freely across the organisation and it’s extended networks (suppliers, partners, etc.) In this way, the intrinsic benefits that high-quality data and interoperability are provided.
  3. Establish New Digital Pathways and Improve Patient Experience:  Digital technologies are not “nice to have”. Used correctly, digital technologies are transformative.  The last 12 months have helped reshape health and care provision in the UK, turning traditional health and care models on their head. West Hertfordshire Hospitals NHS Trust didn’t just set up a virtual ward to manage patient care. They set-up a virtual hospital – they managed around 1,200 patients at home.
  4. Building for the Future: Work in an agile manner, deliver quickly and allow for asset re-use to optimise efficiency. Consider Information Governance, Collaboration and commonality between organisations so that information can be easily shared, compared and used, creating a focus on patient safety. It needs to be easier to write applications or create services that interact with data from different NHS organisations. Considering Information Governance needs, separating the data layer from the application layer removes a significant barrier to innovation. Software providers can offer the application software, but the data will be stored separately and securely in the cloud, facilitating a consistent data platform across the NHS.
  5. Make it Easy for People to Do the Right Thing:  Give people the confidence to drive changes where they work. Much was done during the pandemic to simplify information guidance, reducing it to one page of clear advice, letting clinicians share data with confidence.  Keeping the language simple and process clear ensured that it was easy to use on the front line, working with data in a safe, appropriate, and easy to maintain.

More detail can be found in the article below from Digital Health. A fascinating read.

About Engaging Data:  Engaging Data are a ‘real-word’ data strategy and implementation specialist, focused on delivering transformative projects on Information Management and Governance, Data Warehousing, MDM and Reporting/Visualisation delivery.

We are technology agnostic, working on critical data strategy, digital transformation and data-warehousing-based turn-around and reimplementation programs across Multiple industries.  Please address any questions via www.engagingdata.co.uk or by email to sales@engagingdata.co.uk.

Data Masking

Jun 15, 2020 by Marketing

The data masking challenge 

One of our clients had an interesting data masking requirement. How to mask Production data to meet with GDPR and IT security policies. The data needed to be human readable enabling the development and testing teams to create a data feed for a new Client Portal system. However, the core system did not have the ability to mask the data, only scramble or obfuscate. The core system was extremely complex, built & expanded on over 10 years. It is difficult to understand the system & how data is stored because documentation didn’t exist!

Furthermore, the architecture restraints meant there was not enough storage space to hold a second (in-line) database with masked production data.

Is this a common problem?

The more companies we speak to, the more complex or complicated situations we find. From our experience, we’ve found a pattern emerging in the common problems or requirements:

  • Old Tech – Ageing trading platforms/core systems or sources of data often don’t have the functionality to masked data. Those that do or have extensions/plug in to mask the data often take a long time to process or do not have the flexibility to fit every scenario.
  • Quick turnaround – Near realtime data is nice to have, but not always a real requirement.
  • Specific/varied masking – Different types of masking needed, obfuscation, scrambled, encrypted or human readable & randomised.
  • Storage – Limitations on storage or infrastructure makes it difficult to store an entire copy of production. 
  • Cost – Large database providers offer alternative tools with the same effect but also command a very large price tag.
  • Time – Developers can develop hand-cranked specific solutions which take reasonable amounts of time to develop but much longer to test to ensure the solution is working as expected.
  • Doing the right thing – Most clients want to do the right thing to meet regulatory requirements but see this as a complicated housekeeping chore and recognize the risk but choose to ignore it.

Engaging data discovery

We had a lot of options to solve this problem, but selected Redgate Data Masker and here is why:

  • After a review of the underlying data structure, it was too difficult, costly & time intensive to try to transfer the data into the Test environment and apply masking rules.  
  • We discovered that it would take 32 to 48 hours to copy the “majority” of the data from Production to UAT environments. Doing this would copy most but not all of the data creating a potential for leaving things behind. Plus it would take more time to run the system’s own obfuscation processes (another 8 hours).
  • Masking not Obscuring. Create human-readable values. i.e. Mr. Smith converts to Mr. Jones. This was not available from the trading platform’s masking function.
  • Defined values. Create predictable values, such as a telephone number set format or date of birth.
  • There was a lack of documentation regarding the location of personally identifiable data. This could result in the process missing part of the system if we processed the whole database.
  • We had a requirement to build in a verification process, comparing the masked data against the source. This report would answer the question – “have we missed masking any records?”

We created a simple plan to extract the data, load into a SQL database and then mask. Only taking required data increased efficient use of storage and reduced processing time. This would allow the Client’s development team to export the masked data and transfer into the Client Portal. 

Choosing the right tool

Identifying the data was a difficult manual process because of the core system’s table/column naming convention. Engaging Data’s Consultant used the WhereScape 3D product, which documented the structure of the system into a metadata layer. The consultant worked with the business teams to update the metadata layer & highlight fields that contained personally identifiable data. In addition, we added business definitions. Using an agile approach, each columns type of data masking requirement was agreed, along with how data joined and stored/reused in different tables. Helpfully, WhereScape 3D provided all the known diagrams and suggested relationships, helping to reduce the investigation time.

At the end of this exercise, WhereScape 3D produced detailed documents of the core systems data structure as well as analysis of the data cardinality/profiles. It uncovered some interesting points about the system, including some parts of the system that held personally identifiable data, that the client had not known existed.

Putting the Data Masking solution together

Using the information within the metadata; WhereScape’s Red imported the physical structure of the system and automating the extraction of data into a SQL database on a scheduled basis. We started off daily, but later to increase to every hour.

Now that the data was at rest in the SQL database, our consultant used Redgate’s Data Masker to convert the personally identifiable data to a data set, based on the agreed rules held within the metadata. Once the rules had been designed, WhereScape’s Red scheduler automated the masking so that it started as soon as the loading has completed. 

Data processing, including masking and being loaded into the target database, took place within 4 hours (initially). Not too onerous and very timely compared to other options. More importantly, meant we reduced processing time by a further hour.

Did the data masking work?

Using WhereScape Red, the Engaging Data consultant was able to build a comparison process, that utilised the metadata (only using those field marked as containing personally identifiable data) and compare the values before and after the process. 

The processed ends with an automatic email of the data masking comparison report. This report contains a summary of field error analysis as well as a number of field errors per record. The latter was used to fail the process & prevent the data from be transferred to the target database. Automating this, enabled the Client to feel confident that the process was working correctly.

In conclusion

All sorts of tools can be used to mask data. We find the best of them will automate the process allowing you to decide how to mask, when to mask & how frequent to do it.  

If you would like to learn more about this Redgate‘s Data Masker, WhereScape Red or how we can help with your data project, please feel free to contact office@engagingdata.co.uk


Would you like to know more?

Engaging Data Telephone

Call us on

(+44) 0204 566 5056

Engaging Data Question

Send a message

Here

Supporting Girls Football

Jun 8, 2020 by Simon Meacher

In 2019/20 season we sponsored Pace Youth‘s U16 Bobcats girls football team in Southampton playing in the Hampshire Girls Football League.

Instead of putting our logo on the shirt, we choose to support a charity Young Minds who are the UK’s leading charity fighting for children and young people’s mental health.

Engaging Data will continue to support the team & its charity Young Minds in going into the 2020/21 season!

Young Minds is a great place for parents and children to find support. If you are able to, please support this fantastic charity: Donate Here.

COVID-19 cut the season short, but we hope the team get back to playing football, safely. Who knows what football will look like next season, at a professional level or grassroots. We can’t wait to hear how the team get on, good luck Bobcats!

Data Vault 2.0 Meet Up

Jun 4, 2020 by Simon Meacher

Engaging Data has started to work on Data Vault projects. If you are not aware, Data Vault is great methodology for data development (DevOps/DataOps). It’s a good fit for how we like to work with clients!

We have seen it work really well with cloud technologies such as Snowflake, but it is very adaptable to most architectures. 

You would not believe the speed we can deliver project using Data Vault and WhereScape! We talk in minutes and hours not days and months! Don’t beleive it, get in contact for a demo.

What can you expect?

The Data Vault user group has set up a free meet up. This meeting will be a great introduction to how Data Vault 2.0 is being used. John Giles a Data Vault guru will be making a guest appears, so you know there will be good content!

See you there!

Redgate Partnership!

Jun 3, 2020 by Simon Meacher

Using the right tools for the data job.

We are really excited to go into partnership with Redgate. Find out more about them here. Redgate’s software can help from development to environment controls. Moreover, these tools are perfect for any data developer!

Redgate develops tools for developers and data professionals. These products are a natural fit for Engaging Data providing suitable tools to for data development teams and enhancing or streamlining processes. Redgate produces specialised database management tools for Microsoft SQL Server, Oracle, MySQL and Microsoft Azure.

All of these platforms and tools driven by Engaging Data Consultants create engaging data solutions.

Our consultants are already using Redgate software in a project to mask data for a data warehouse. As a result, we will share some of the challenges and how we’ve built robust solutions using Redgate’s tools!

If you would like to know more about the tools we use, or have a question about Redgates product, please get in touch!


Would you like to know more?

Engaging Data Telephone

Call us on

(+44) 0204 566 5056

Engaging Data Question

Send a message

Here

Speed up development, but keep gold standard code!

Jul 5, 2019 by Simon Meacher

Output from the development team has doubled, engagement with business teams has increased & Power BI dashboards are being rolled out within days, rather than weeks.

Our Data Warehouse Standards & Lifecycle saves time & increases productivity.

Engaging Data consultants have worked with an existing WhereScape client, to kick start their data warehouse project. Implementation of bespoke development standards and a life cycle, designed for a small development team who have a mixture of data and application developers. These standards help the development team cross train & develop code using different technologies, without a large amount of effort.

Further more, the time taken to review code & publish to production has been reduced. We are now investigating how to automate the release process!

If your interested in our WhereScape Red development standard document, please get in touch.

Data Strategy

Mar 13, 2019 by Simon Meacher

Our consultants are helping a investment company within the City of London, to form their data strategy.

The COO has identified the value that data can bring to the company during a data warehouse proof of concept. The work has started to create good data & produce engaging data solutions to drive better insights for clients.

If your interested in learning how our consultants can help you, please get in contact.

Data Management

Mar 13, 2019 by Simon Meacher

Our Director, Simon Meacher, was part of a panel to discuss the importance of data governance. The series of talks will be framed with “what I wish i’d known…”. A clever phase, we think it helps to engage with people who would like to know more about the subject.

What an experience panel!

It was a very well run evening, we hope everyone who attended was able to take something way.

Topics included:

  • What is data management to you?
  • What challenges have you had, and what is ongoing?
  • Lessons learned from implementing technology?
  • Challenges from Data Management reacting to emerging technologies (AI, Machine Learning, robotics, digitisation)

If you are interested how these questions were answered, please get in contact with us.

Special thank you to Paul Goldring & the team at Lawrence Harvey for arranging the event.

Simon would be happy to help with the next event!