The Gold Standard – Part I

The Gold Standard – Part I

Engaging Data Explains :

Creating The Gold Standards in Data –

Part I


Achieving an excellent level of data architecture is far from easy. But it is certainly possible if you implement certain guiding principles. Central to this is the implementation of a gold standard benchmark, which can then underpin any effective data architecture operation.

However, a gold standard is not something that comes naturally. In our experience, it requires diligent thought, effort and openness to change.

So in this four-part blog, we’re going to discuss some of the considerations related to this important goal for many organisations.


Resource Considerations

When we think about creating engaging analytics or data platforms to shape the growth of an organisation, the focus is often on the finding tool capable of developing the solution and not the surrounding aspects. But all of the ingredients that go into the mix are critically important.

Imagine you’re the owner of a cake shop providing bespoke cakes for your customers. Your products have to be good enough to keep customers coming back, but they also have to retail at an attractive price point. This means that there are immediately resource considerations.

You may choose to focus on providing a premium product, creating high-quality goods for a premium price.  Alternatively, you may deliver a higher volume product, baking lots of different cakes on a larger scale, which are still of a good standard, but only suitable for a lower price point.

In order to make this decision, you need to understand the following:

  • Product – what we are providing, and the value that we create.
  • Place – the environment that makes it possible to create the necessary standard of products at a sustainable rate.
  • People and Process – the team, the processes and the delivery environment (the bakery, the storage, the front of house, stock control, delivery, billing, etc.) that produce and maintain the consistent quality of product and experience.

If you can put all this together then you have the beginning of a gold standard in cake production. By the same token, in our field Engaging Data helps companies to review all of the data elements supporting such a setup, combining this with their aspirations to form bespoke gold standards. This enables our clients to achieve profitability and success.

Controlling The Input

Controlling input is a critical component of processing gold standard products. These vary, depending on what you are trying to produce, but examples include:

  • Requirement gathering.
  • Sources of data.
  • Quality of data.

Controlling inputs and creating quality is critically important, as if you put terrible into a system then the ultimate outcome will be a terrible product! Thus, you need to understand the requirements of your customers. In the cake shop example, this would mean knowing what type of cakes your customers desire, the toppings needed, the date and time of delivery, any dietary requirements, and so on.

The good news is that controls can be quite straightforward. They can be something as simple as checking data. So in a cake shop, it’s vital to confirm the direct requirements of your customers, noting down all relevant information. This can make the vital difference between providing the ideal products for your customers, or producing something that seems excellent, but is rendered useless or sub-par by one important constituent. For example, you might produce a cake for someone with allergy needs that is simply inedible from their perspective.

Quality control can be achieved by creating a simple order form. For example, a cake shop might include:

  • All vital information being distilled into yes/no questions – eg. “should cake contain nuts?”.
  • Ensuring that all product types are selected, and that nothing out of the unusual is ordered.
  • Product limitations being noted expressly on the form – acting as a reminder and preventing incorrect ordering.

Such a review process ensures that information is gathered correctly, and creates a collective responsibility for discerning the appropriate information. Important questions that you can ask yourself in a data environment to acquire such critical information include the following:

  • How will the requirements come into the team? 
  • How do we need to record them?
  • Do we have the right tools to collect the data?
  • How will we handle data quality?

Output Consistency

The output is the result of your efforts, so you have an innate interest in ensuring that it’s the best possible product. In common with the input, it is important to understand what you can control to reduce risk, as this can have a big impact on your output.

Central to this process is building systems and controls that enable you to monitor outputs. This in turn makes it possible to assess if they need to be altered in any way. This means that in a cake shop, you may consider the impact that each of the following areas has on the supply chain of cakes:

  • Production Team (bakers, shop front, etc.).
  • Ensuring similar standards and experience.
  • Providing the same customer experience.
  • Ensuring knowledgeability about the production processes, industry and competitors.

Each aspect of the order and production process also needs to be assessed and standardised:

  • Enjoyable and consistent ordering experience.
  • Stock control to manage high-quality ingredients. 
  • Quality control of all products.
  • Meeting all food hygiene regulations with a 5-star rating.

And then the tools of the trade should also be taken into consideration, as part of an ongoing auditing process. Central to this is ensuring that any equipment being used is within acceptable operational parameters, particularly not being overloaded or overstretched in any way.

So when you’re working in a data environment, or any working context, if you want to create gold standards then it’s important to continually monitor and challenge your processes. Ask yourself questions continually, such as:

  • Do we have the right team in place?
  • Do we understand what standard of products that we need to create?
  • Do we have processes in place that enable us to produce quality products?

This is just the beginning of our insight into creating gold standards with data, so in our next blog we will move on to discuss several other important factors.



Peer Review

Peer Review

Engaging Data Explains :

Peer Review


One of the most useful aspects of Power BI is the ability to achieve Self Service business intelligence. This is achieved with the Power BI Reporting Server. So in this blog post, we’re going to discuss how this is set up, along with some of the important things that you need to know about this system.

There are two important facets of this process that critical to understand. Firstly, you can achieve everything that we will discuss in this blog with the free version of Power BI. And, secondly, this is the ideal way to facilitate a move to the cloud; a hugely valuable process for virtually all companies.


Keep Data Masking Simple

We often encounter clients who have data masking requirements, and these can vary quite considerably. Recently, one such client has a particularly interesting business need; they needed to mask their production data, but it was also vital for them to retain human-readability, so that their testing team can utilise reporting and their internal systems.

This was proving challenging for the clients, as its core system did not include the ability to mask the data, and so they had no internal solution. On top of this central issue, the system that the company had built was also extremely complex, having been built upon over 20 years in their industry.

And another issue was that there wasn’t a huge amount of documentation included with the system to help either the client or ourselves understand it, and how their data is stored. Furthermore, they didn’t have enough storage space to hose a second database with production-grade, masked data.

This is not an uncommon scenario. We deal with quite a lot of clients, and many of them have complex, nuanced or specific requirements. Often they will need their data masking quickly, and frequently different types of masking are needed – obscured, human readable and randomised being just some of the requested requirements. And many clients also have limitations on storage and infrastructure that make the whole process more complicated.

Fortunately, the experience that we have accumulated means that we are able to deal with this multitude of different requirements, and deliver whatever a client needs.

The Body

If you have a small team, or release a large number of objects in each release, the time and resources needed to conduct a good peer review can have a negative impact. This process can potentially detract from development activities, delay projects, or ensure that the peer reviewers are forced to work ungodly hours!

Our app can often be the solution to this problem, particularly as it can be used in several different ways. For example, the app can be utilised to target specific release folders, or for the entire repository. The output of this process can then be emailed or summarised in a presentation tool, such as Power BI or QlikView/QlikSense.

The frequency of these checks can be user-defined, so that any schedule is possible – on-demand, hourly, daily, weekly and monthly are some of the most common choices. We have found that end of day reports are particularly useful, as these provide the developers with a list of things to change and adapt to standards that ensures that their day can become productive almost immediately.

The app also handles exceptions with aplomb, featuring the ability to flag anything that meets with your exception list. This helps to keep you abreast of anything that may not completely meet with your standards, but is acceptable for the time being.

The Tail

When you’re dealing with this issue, it’s important to nurture good developer habits. Once you have built consistent code, developed with an approach that works for your company, then the Power BI app can be integrated into your development lifecycle, helping you to monitor and educate your team. This is particularly useful if you have a high turnover of staff, or use third parties to supplement your development resources.

On one occasion, we deployed and tailored the app for one of our clients, whose main objective was to ensure that all contractors developed consistently to their standards. This involved daily checks on the code, which were summarised and sent to the whole team for action. The focus was on reducing the time taken to peer review by encouraging immediate on-point development. Developers are challenged to reduce their list to zero each day, although it’s not quite gamification just yet!

Conclusion

The more companies we liaise with, the more we encounter complex situations. Power BI is therefore extremely valuable.

This tool will not prevent you from having to peer review, but it will automate 90% of the job, allowing the peer reviewers to spend their time investigating code that would otherwise never be peer reviewed; much like seeking spelling mistakes via proofreading.


If you’d like to learn more about this app, or how we can help with your data project, please feel free to contact us.



Dealing with Data Masking

Dealing with Data Masking

Engaging Data Explains :

Dealing with Data Masking


The Power BI Reporting Server is a powerful tool, and one of its most intuitive features is the ability to achieve true self-service business intelligence.

So in this blog post we’re going to walk you through the process involved with Self Service BI. You can use the free version of Power BI in order to achieve this, so there is no barrier to entry.

One of the best things about Power BI is that it keeps data masking simple. Sometimes clients have complicated or very specific requirements, so it’s always important to make the process as straightforward as possible.


Unusual Requirements

For example, one client we were working with had an unusual and interesting data masking requirement; they needed to mask their production data, but ensure that it was also human-readable, so that their development and testing team could create a new client portal system. This would have been complicated enough in itself, but their existing platform was also extremely complex, while there was little documentation available to help them understand the data storage process.

And there was another problem as well. The company had insufficient storage space to hold a second (in-line) database, with the quality of production grade, masked data that was needed.

As we’re experienced and accomplished in this field, we quickly identified several possible solutions to this scenario. But choosing the best one was critically important. After some assessment, we elected for Redgates Data Masker for the following reasons:

• After a review of the underlying data structure, we reflected that it was too difficult, costly and time-intensive to attempt to transfer the data into the test environment and apply masking rules.

• It was important to make a distinction between masking and obscuring the data. The client wanted human-readable values, so we had to ensure this quality was retained.

• There was a lack of documentation regarding the location of personal identifiable data, which could result in the process missing an important part of the system.

• We also had a requirement to include a verification process, comparing the masked data against the source. This report would then provide us with an insight into whether we had inadvertently masked any records.

We devised a simple plan to extract the data, load it into a SQL database, and then finally complete the masking process. This would allow the client’s development team to export the masked data and transfer it into the client portal.

Technology

Identifying the data was always going to be a tricky process if attempted manually, due to the core system’s conventions around the naming of tables and columns. So to address this, we used WhereScape’s 3D product, which documented the structure of the system into a metadata layer. Our consultant worked closely with the business teams to update the metadata layer, highlighting the fields that contained personally identifiable data, while also adding business definitions.

We also took the opportunity to agree the type of data masking that was needed within the field. The most challenging aspect of this was understanding how the data joined or reused in different tables. But the client provided all of the known diagrams and suggested relationships, which significantly shortened the investigation time involved.

At the end of this exercise, our client also produced detailed documents of the core systems data structure, as well as analysis of the data cardinality/profiles. This uncovered some interesting points about the system, including some aspects of it that held personally identifiable data of which the client was unaware.

Using the information within the metadata, the physical structure of the system was imported into WhereScape’s Red product, which automated the extraction of data and loaded the data into a SQL database on a scheduled basis. We started off this process gently, working on a daily schedule, but as we became more certain about the process, we increased this duration to hourly.

Now that the data was present and optimised within the SQL database, we next used Redgate’s Data Masker to convert the personally identifiable data to a dataset, based on the agreed rules held within the metadata. Once the rules had been designed, WhereScape Red’s scheduler automated the masking, so that it began as soon as the loading had been completed.

What could have been a hugely complicated and onerous process was made far simpler. The whole database was copied, masked and sent to the client portal within four hours.

Measuring the Process

As some of the data was being sent to a third party, it was very important that there was never any risk of a data breach. But we had no problem in building a methodology to address this. Using WhereScape Red, Engaging Data was able to build a comparison process. This utilised the metadata, using only those fields marked as containing personally identifiable data. This also made it possible to compare the values before and after the process had taken place.

Finally, the comparison report was automatically emailed to the management team, regardless of whether or not a failure was triggered. This email contained a summary of field error analysis, as well as the number of field errors per record. The latter was used to assess the overall process and prevent any sensitive data from being distributed to third parties. By automating this, we were able to reassure the client that the whole process was working correctly.

Conclusion

It’s quite common for Engaging Data to encounter complicated situations with a wide range of clients. Each of the following are common problems or requirements:

• Ageing trading platforms/core systems, or sources of data that can’t utilise off-the-shelf data masking products.

• Companies need the data masked quickly, in virtual real-time speed.

• Different types of masking are commonly needed, whether obscured, human-readable or randomised,

• There are limitations on storage or infrastructure.

The best data masking tools will address these issues and automate the process, allowing the

client to decide how, when and where to mask. Our expertise and experience in this area has enabled us to achieve some excellent results with some highly complex datasets and requirements.


If you would like to learn more about this app or how we can help with your data project, please feel free to contact us.



How Jenkins Takes Wherescape to Another Level

How Jenkins Takes Wherescape to Another Level

Engaging Data Explains :

How Jenkins Takes Wherescape to Another Level


Data warehousing has grown in importance and popularity, as the global market for analytical systems continues to increase. The global market for data warehousing is expected to touch $30 billion by 2025, based on annual growth of around 12%. This led 76% of IT managers and executives to state that they are investing more in their analytics platforms when surveyed.

As more businesses use data warehouses, efficiency savings and improvements are expected going forward. Data automation is a concept that will benefit many companies, but it’s still important to choose the best solution.


Game-Changing Solution

That’s why using Jenkins to deploy Wherescape solutions is a game-changer. This integration tool used with Wherescape data warehouse automation software is rocket fuel for an already powerful package. 

With Jenkins, it’s possible for developers to build and test software projects continuously, thanks to actions built into the tool. This makes it easier for developers to integrate changes to any project, increasing flexibility in working practices. This can be hugely advantageous in the fast-moving contemporary data climate.

And this is just the beginning of the integration offered by Jenkins. The tool also makes it possible to integrate with other apps and software solutions, by installing plugins for the external tool – examples of this include Git and Powershell. There are over 1,000 plugins available for Jenkins, meaning that the platform supports the building and testing of virtually any WhereScape project.

Low-Maintenance Software

Another key advantage of Jenkins is its low-maintenance nature. The tool requires very little attention once it has been installed. However, when updates are required, the software includes a built-in GUI tool, ensuring that this process is as painless as possible.

Yet while it offers an ideal platform, Jenkins also benefits from continual improvement, thanks to its open-source nature. There is already an enthusiastic community contributing to the tweaking and evolution of the software, and this is expected to grow further still in the years to come.

Jenkins is a shining example of continuous integration, delivery and deployment, sometimes referred to as CI/CD. This approach to data warehousing means that code changes that translate into real-world improvements can be made more frequently and reliably, due to the automation of deployment steps.

Easy Plug-Ins

The process for plugging Jenkins into Wherescape begins with downloading the Java SE Development Kit, at which point you will also need to add JAVA_HOME to your environment variables. That is the only technical part; you then simply download Jenkins using the installer and follow the on-screen instructions. Before you can use the software, it will be necessary to create an admin username and password. Then you’re ready to go!

Among the palette of useful features included in the software is a list view of open projects, which provides an instantaneous insight into the status of everything important that you’re dealing with. This is the sort of feature that has ensured that as well as being powerful and flexible, Jenkins has also earned kudos in the data warehousing world for being user-friendly. 

Jenkins incorporates a user interface that is simple to pick up and navigate. There is a vast range of online tutorials available, while the active community that contributes to the tool is always on hand to offer assistance.

Configure and Customise

Another important aspect of Jenkins is the scope of configuration and customisation that it makes possible. Users can be managed by creating groups and roles, and this can all be handled elegantly via some straightforward menu prompts. Jobs can also be configured; for example, the tool enables them to be executed via timed intervals. 

Every aspect of the Jenkins software has been cultivated to ensure maximum functionality with minimum effort, yet enabling users to customise and monitor everything extensively at all stages of the process. You can even set up automatic email notifications, ensuring that everyone involved with a data warehousing project is kept in the loop.

At a time when the amount of information that companies deal with is escalating rapidly, data warehousing is becoming increasingly important. It’s simply not possible to ignore big data any longer; this is tantamount to being left behind by your competitors. Jenkins & WhereScape is an elegant data warehousing solution that has helped a multitude of businesses get to grips with their data warehousing requirements, without investing a huge amount of effort in training, onboarding, or hiring experts.

Wherescape was already a market-leader in its field, but with the advent of CI/CD tools such as Jenkins, this top solution just became even more compelling.

Self-Service BI

Self-Service BI

Engaging Data Explains :

Self-Service BI


Self-service business intelligence is making a huge difference to companies across a variety of sectors, by helping to optimise data analysis.

However, some businesses perceive that implementing a business intelligence approach can be challenging, due to various barriers to entry issues. This is something of a false impression, but this is why it’s important for business intelligence to explore data using simple tools, while enabling questions to be answered quickly.


Dealing with Data Management

Many traditional data teams build pipelines of data in order to deal with their data management procedures. But the approach utilised is not always ideal. It’s common for data engineers and analysts to explore data and build solutions with specific results in mind, which is rather putting the cart before the horse. More logical would be to make the data generally available and appropriately linked. This enables end-users to more readily explore the data, drawing more nuanced conclusions from the assembled information.

Truly valuable business intelligence should always be an end-to-end iterative process. Business teams need relevant, timely data in order to uncover accurate insights. Deploying platforms to achieve this is merely one crucial component of business intelligence. 

As you develop your business intelligence approach, it’s important to understand that self-service business intelligence doesn’t require everyone in an organisation to train as business analysts. Nor should it mean removing responsibility from your IT department. Instead, it’s about encouraging and educating your business teams to understand and interact with the data they generate throughout their work – creating a joined-up process across your organisation.

Using Power BI

One of the best ways to achieve this is via Power BI. This cloud package has been on a long journey since it was first released, but is now the ideal app to ensure that companies can engage with data safely. Probably the most valuable aspect of this package is that it provides an incredible amount of options and guided analysis, making your business intelligence process much more flexible.

Did you know, you can achieve 90% of this using Power BI Report Server – AKA the on-premise version of Power BI. 

Using Power BI Report Server prompted an excellent reaction from end-users, who were excited at their ability to slice data and retrieve answers to client queries rapidly. This is one of the big advantages of business intelligence; it enables companies to get to the core of what drives customer demand quickly. You are accelerating the ability of clients to use data to solve critical business problems.

Dealing with self-service business intelligence can be intimidating, though. This is particularly true if an organisation requires this to be achieved without using the cloud, or spending a significant amount of money on new software. Historically, this form of business intelligence has posed problems with identifying required data, such as reports, with little or no description or business knowledge behind the purpose of the reports. For this reason, self-service business intelligence has been known to reduce IT departments to cold sweats!

In early experiences of self-service business intelligence, there was often no gold standard with design. Imagine – using SSRS where there could be hundreds of folders and sub-folders, with dozens of data connections, generating thousands of reports. And then every report featured a different format. Understanding how each report was being used was nigh-on impossible, and general data analysis was far from effective.

Range of Options

But things are changing. New self-service business intelligence solutions offer clients a range of options, and enable data to be provided safely and in a controlled format. And there are several options available for companies, with decentralised, centralised, and hybrid approaches all possible. These approaches will be selected depending on the demographics of IT teams, with the level of governance and control involved having a major influence.

The great thing about Power BI desktop is that it is an adaptable tool, enabling users to get started with data transformation quickly. Most experienced Excel users can quickly get to grips with this innovation, meaning that it can be easily and widely implemented across an organisation.

The tool itself allows IT to extract the data transformation and, if required, reverse engineer it back into a data warehouse. Providing the business users with a tool that acts like a Rosetta stone between business and IT!

It’s important to emphasise, though, that regardless of the technology you always need to understand who you are catering for with data. With this in mind, there are several ways that you can delegate the permissions, rules, and responsibilities with Power BI, as this adds to the flexibility of the platform. Power users are also important, as these credentialed individuals utilise their experience with Excel to produce reports for an organisation.

As you develop your self-service business intelligence strategy, it is vital to implement proper governance. This will help you to avoid creating data silos, data sprawl, poor performance, and lax security. Unless you implement appropriate governance, business users will have unrestricted access to source systems and Power BI folders. This can lead to inappropriate sharing of data.

Key Factors

It’s also important to consider the following factors:

  • Service size and data storage – you do not have unlimited resources with Power BI Report Server, and you could therefore experience larger datasets consuming a significant portion of resources, impacting the performance of the entire service.
  • Risks to source systems – allowing users to connect with any source system or raw files can create problems.
  • Access and permissions – security must always be taken into consideration. Failing to pay proper attention to access and permissions can result in numerous ad-hoc groups being created, which can then be problematic. Controlling groups, and tweaking which users can be added into each AD group, is definitely advisable.
  • Many versions of the truth – if you have several different people involved to create the same report, you’re likely to get numerous different answers. This is why it is important for data warehouses to be populated, effectively creating a single source of universal truth.
  • Reduced audit and tracking – if end-users fail to provide adequate details or purposeful dashboards, the ‘who, what & why’ regarding the purpose of the report is lost, undermining the whole process.

Summary

Implementing Self-Service Business Intelligence has become far more feasible, but it’s still important to impose some control over how your services are being utilised, in order to generate the maximum and most accurate insight. Power BI Report Server can be an excellent tool in enhancing business intelligence, and definitely one we recommend for clients who are reliant on data.


If you are interested in Self Service BI, book a call us and find out how it could help your team.



High-Level Design Documentation

High-Level Design Documentation

Engaging Data Explains:

High Level Design Documentation –


Have you ever needed to create high-level documents of your data automation that explains a project/sprint within your WhereScape Red repository? Maybe so it looks a little like the above?

We recently worked with a client who wanted to represent the amount of change undertaken within a single project. They required something simple yet demonstrated the amount of change within each layer of the data automation.

Instead of creating something new, we re-used the WhereScape RED Overview design, WhereScape used to illustrate the design of the architecture.

A sample data solution shaped into the high-level design.

Engaging Data consultants worked with the client to create a solid naming convention and development standards. With this foundation and the metadata repository, we developed a script that produced an HTML document with details the client was looking for.

The idea continued to develop and now has options to include the following details:

  • Number of Jobs, Procedures and Host Scripts that support each layer.
  • Data Volume per object and summarised per layer
  • Processing time for each layer, with avg completion time & avg time to run

WhereScape RED and 3D speeds up development & documentation of the technical design. This solution utilises the metadata to create support or narrative documents for other business areas.

Build re-usable scripts, dashboards or reports for non-technical business teams & provide clarity around the technical function of your automation.


If you are interested in receiving a copy of the script that produced this report, please email simon.meacher@engagingdata.co.uk