More data is being collected, stored, and analysed than ever before. One of the digital age challenges is how and where we store all this data safely and accessibly.
A modern Data Warehouse can solve many of these issues, using multi-tiered architecture to ensure different users with various needs can access the information they need. In order to expand and develop a Data Warehouse, documentation is invaluable.
Are you considering approaching a Data Warehouse using a documentation method? Then read on to find out more!
What is Documentation?
Data documentation is vital in many ways for a Data Warehouse, and it’s how you can ensure that your data will be understood and accessible by any user across your organisation. Documentation will explain how your data was created, its context, structure, content, and any data manipulations.
Documentation is crucial if you’re looking to continue developing, expanding, and enhancing your Data Warehouse. However, it’s essential to understand what documentation entails to ensure your Data Warehouse operates smoothly and its processes run smoothly.
Documenting a Data Warehouse
Like we said, the amount of data that we collect as store as organisations is increasing and traditional Data Warehousing that may be set up using a simpler database structure will often struggle to cope. Partially with the sheer volume of information it needs to store and analyse, it also needs to be accessed by various users, often in different ways. A document-based approach to data warehousing will allow for streamlining of data from multiple sources and multi-user access.
When documenting your Data Warehouse, you should begin with creating standards for your documentation, data structure names, and ETL processes, as this creates the foundation upon everything else is built. A robust and excellent Data Warehouse will have straightforward and understandable documentation.
A successful Data Warehouse implementation will often come down to the data solution’s documentation, design, and performance. However, if you can accurately capture the business requirements, then using documentation, you should be able to develop a solution that will meet the needs of all users across an organisation.
At Engaging Data, documenting a Data Warehouse has become second nature. Although it’s not necessarily the easiest or most logistically straightforward part of the process, it’s necessary to ensure your data warehouse processes run smoothly.
What Documentation do I need for a Data Warehouse project?
The exact pieces of documentation that you need may vary by your particular Data Warehousing project. However, these are some of the significant elements of documentation that you should have:
The Business Requirements Document
will outline and define the project scope and top-level objects from the perspective of the management team and project managers.
Functional/information requirements document
which will outline the functions that different users must be able to complete at the end of the project. This document will help you to focus on what the Data Warehouse is being used for and what different pieces of data and information the users will require from the data warehouse.
The fact/qualifier matrix
is a powerful tool that will help the team understand and associate the metrics with what’s outlined in the business requirements document.
A data model
is a visual representation of the data structures held within the Data Warehouse. A data model is a valuable visual aid to ensure that the business’s data, analytical and reporting needs are captured within the project. Plus, data models are helpful for DBAs to create the different data structures to house the data.
A data dictionary
is a comprehensive list of the various data elements found in the data model, their definition, source database name, table name and field name from which the data element was created.
Source to target ETL mapping document
which is a list focusing on the target data structure, plus defines the source of the data and any transformation that the source element goes through before landing in the target table.
What are the problems of Documenting a Data Warehouse?
Documenting a Data Warehouse can be a massive project, depending on the amount of data, the number of users that need access, and the business requirements. As the amount of data held within a Data Warehouse increases, management systems will need to dig further to find and analyse the data. This is especially an issue within traditional Data Warehouses, and as data volume increases, the speed and efficiency of a data warehouse can decrease.
Generally, spending time to understand and document your business needs will make documenting your Data Warehouse easier because Data Warehousing is driven by the data you provide. If you don’t take the time to map these critical pieces of information early in the process, you may run into problems later on. Similarly, the correct processing of your data and structuring it in a way that makes sense for your organisation today and in the future. If you don’t set yourself up for the future, structuring data becomes more complex and can slow down the processing as you add more information to your Data Warehouse. In addition, it can make it more difficult for the system manager to read the data and optimise it for analytics.
Overall, the better the initial documentation, planning, and business information model are, the easier your implementation process will be and make it easier to continue to add data to your warehouse. By carefully designing and configuring your data from the start, you’ll be rewarded with better results.
Another potential problem in documenting a Data Warehouse is choosing the wrong warehouse type for your business needs and resources. Many organisations will allow various departments to access the system, stressing the system and impacting efficiency. By choosing the right type of warehouse for your organisation and making a future-proofed decision, you can balance the usefulness and performance of your data warehouse.
Data Warehousing is an excellent system for keeping up with your business’s various data needs. By making many long-term decisions and preparing at the start, you can avoid many potential problems when documenting your data warehouse. However, you can prevent many challenges associated with data warehouse deployment and implementation by utilising a tool like WS Doc.
What is WS Doc?
WS Doc is a simple-to-use tool that automates a lot of the processes of documenting your data warehouse by automating the publication of WhereScape documentation to your choice of WIKI technology.
In addition, with WS Doc, you can collaborate on workflows, editing data sets and input, allowing various users to work on the project simultaneously. As well as integrating with other apps and systems, WS Doc makes collaboration and streamlined working possible.
Why was WS Doc created?
WS Doc sought to bring document automation and assembly to more industries, turning tedious and detailed work into automated processes and systems.
By allowing you to gather data and instantly generate template documents, even generating document sets from your data, you can save up to 90% of the time that you’d have spent on drafting documentation.
By automating the publication of WhereScape documentation to your choice of WIKI Technology (Confluence, SharePoint, GitHub, or something else), you’re providing your documentation with the power of the WIKI technology, allowing it to be easier to digest, apply, and share.
Overall, WS Doc streamlines and automates the process, speeding it up and making it less resource-heavy.
Want to learn more about WS Doc?
Click the button below. Everyone is on the same page with WS Doc.
In conclusion, by choosing WS Doc to document your Data Warehouse project, you’re utilising a simple tool to automate processes that otherwise would take a long time, as well as using a lot of resources, and that’s not even considering the possibility of human error in a process that requires a lot of detail and repetitive actions.
We’ve discussed some potential problems you can run into when documenting a Data Warehouse. However, with WS Doc you can overcome these issues because WS Docs is a tool that promotes effective communication and collaboration, engaging with the people using data. It saves time and resources by automating the publication and implementation of documentation. And finally, it ultimately enhances your existing toolset, offering a developed, streamlined, and simple-to-use experience.
Here’s at Engaging Data, we use WS Doc in the documentation of Data Warehouse projects we carry out for our clients.
If you’d want to learn more about the process or see if WS Doc could be the right tool for your organisation, schedule a call with us!
Data warehousing has grown in importance and popularity, as the global market for analytical systems continues to increase. The global market for data warehousing is expected to touch $30 billion by 2025, based on annual growth of around 12%. This led 76% of IT managers and executives to state that they are investing more in their analytics platforms when surveyed.
As more businesses use data warehouses, efficiency savings and improvements are expected going forward. Data automation is a concept that will benefit many companies, but it’s still important to choose the best solution.
Game-Changing Solution
That’s why using Jenkins to deploy Wherescape solutions is a game-changer. This integration tool used with Wherescape data warehouse automation software is rocket fuel for an already powerful package.
With Jenkins, it’s possible for developers to build and test software projects continuously, thanks to actions built into the tool. This makes it easier for developers to integrate changes to any project, increasing flexibility in working practices. This can be hugely advantageous in the fast-moving contemporary data climate.
And this is just the beginning of the integration offered by Jenkins. The tool also makes it possible to integrate with other apps and software solutions, by installing plugins for the external tool – examples of this include Git and Powershell. There are over 1,000 plugins available for Jenkins, meaning that the platform supports the building and testing of virtually any WhereScape project.
Low-Maintenance Software
Another key advantage of Jenkins is its low-maintenance nature. The tool requires very little attention once it has been installed. However, when updates are required, the software includes a built-in GUI tool, ensuring that this process is as painless as possible.
Yet while it offers an ideal platform, Jenkins also benefits from continual improvement, thanks to its open-source nature. There is already an enthusiastic community contributing to the tweaking and evolution of the software, and this is expected to grow further still in the years to come.
Jenkins is a shining example of continuous integration, delivery and deployment, sometimes referred to as CI/CD. This approach to data warehousing means that code changes that translate into real-world improvements can be made more frequently and reliably, due to the automation of deployment steps.
Easy Plug-Ins
The process for plugging Jenkins into Wherescape begins with downloading the Java SE Development Kit, at which point you will also need to add JAVA_HOME to your environment variables. That is the only technical part; you then simply download Jenkins using the installer and follow the on-screen instructions. Before you can use the software, it will be necessary to create an admin username and password. Then you’re ready to go!
Among the palette of useful features included in the software is a list view of open projects, which provides an instantaneous insight into the status of everything important that you’re dealing with. This is the sort of feature that has ensured that as well as being powerful and flexible, Jenkins has also earned kudos in the data warehousing world for being user-friendly.
Jenkins incorporates a user interface that is simple to pick up and navigate. There is a vast range of online tutorials available, while the active community that contributes to the tool is always on hand to offer assistance.
Configure and Customise
Another important aspect of Jenkins is the scope of configuration and customisation that it makes possible. Users can be managed by creating groups and roles, and this can all be handled elegantly via some straightforward menu prompts. Jobs can also be configured; for example, the tool enables them to be executed via timed intervals.
Every aspect of the Jenkins software has been cultivated to ensure maximum functionality with minimum effort, yet enabling users to customise and monitor everything extensively at all stages of the process. You can even set up automatic email notifications, ensuring that everyone involved with a data warehousing project is kept in the loop.
At a time when the amount of information that companies deal with is escalating rapidly, data warehousing is becoming increasingly important. It’s simply not possible to ignore big data any longer; this is tantamount to being left behind by your competitors. Jenkins & WhereScape is an elegant data warehousing solution that has helped a multitude of businesses get to grips with their data warehousing requirements, without investing a huge amount of effort in training, onboarding, or hiring experts.
Wherescape was already a market-leader in its field, but with the advent of CI/CD tools such as Jenkins, this top solution just became even more compelling.
Have you ever needed to create high-level documents of your data automation that explains a project/sprint within your WhereScape Red repository? Maybe so it looks a little like the above?
We recently worked with a client who wanted to represent the amount of change undertaken within a single project. They required something simple yet demonstrated the amount of change within each layer of the data automation.
Instead of creating something new, we re-used the WhereScape RED Overview design, WhereScape used to illustrate the design of the architecture.
Engaging Data consultants worked with the client to create a solid naming convention and development standards. With this foundation and the metadata repository, we developed a script that produced an HTML document with details the client was looking for.
The idea continued to develop and now has options to include the following details:
Number of Jobs, Procedures and Host Scripts that support each layer.
Data Volume per object and summarised per layer
Processing time for each layer, with avg completion time & avg time to run
WhereScape RED and 3D speeds up development & documentation of the technical design. This solution utilises the metadata to create support or narrative documents for other business areas.
Build re-usable scripts, dashboards or reports for non-technical business teams & provide clarity around the technical function of your automation.
If you are interested in receiving a copy of the script that produced this report, please email simon.meacher@engagingdata.co.uk
Many companies are looking to make Code changes/deployment easier. Often the ability to deploy code to production is surrounded by red tape & audited control. If you don’t have this, count yourself lucky!
Jenkins & Octopus Deploy are two, to name a few (see here), that are helping to automate the deployment of code to production. Allowing companies to adopt a continuous deployment/delivery approach.
For a long time, WhereScape RED has had its own method of automating deployment, using the command line functions.
Why Automate?
Using tools such as WhereScape RED allow elements of automating deployments; however, we know that companies like to use a common toolset for their code deployments; like having a single picture of all the deployments and, in most cases, realise that they want to release multiple code deployments on different platforms because RED doesn’t do everything.
Git?
No problem! There are several ways to do this. Our perfered option is to push the deployment application to the code store respository. Afterall, it is more practical to store the changes you want to push to Production and not every change to any objects, including those that are not meant for Production!
Can I do This Now?
WhereScape RED uses a command prompt file to send commands to the admin EXE. All changes will be applied to the destination database (via ODBC). Installation settings/config is set using XML & a log file is created as part of the process. The XML file contains the DSN of the destination database. Let’s come back to this point later. The XML contains all of the settings that are applied when deploying the application. Settings like Alter or Recreate Job. Please make sure you have this set correctly. You do not want to re-create a Data Store table to lose the history!
Permissions are important. The key to running the command line to publish changes to production is that the service account executing the commands has permissions to change the underlying database.
Integration with Octopus
Octopus deploy uses PowerShell as it’s common execution code. So we have adapted all of our WhereScape BAT files to PowerShell in order to get everything working.
Building a list of repeatable tasks within Octopus is easy & provides an opportunities to create a standard release process that meets with your companies standards/processes. Tasks like database backup, metadata backup and much much more!
It can even run test scripts!
We used a PowerShell script to create a full backup of the database, to be used should the deployment fail. With a larger database, this may not always be the best solution. Depending on your environment set up you may have options to use OS snapshots or other methods to roll back the changes. The good news is Octopus Deploy works with most technology, so you should find something that works for your technology stack.
Recently, we been playing with creating rollback WhereScape applications on the target data warehouse. This is great for restoring the structure of the objects quickly and easily. Reducing risk is a great value add!
Go, Go, Go!
Triggering the deployment was easy, we could set this up in many ways, but used “does the application files exists” trigger to get things started – until the humans learned to trust the Octopus process.
However, linking the release to Jira is just as simple. Imagine, you’ve completed development and want to sent the code to UAT. You click the button to update the ticket…….wait a few seconds…..and the code is deployed! It’s complicated to set up, but you get the idea.
Final Thoughts
Octopus is a great tool and the automation really helps to control the process of deployments. Coupled with WhereScape automation, this provides and excellent end to end solution for data warehousing.
If you are interested in CI/CD and WhereScape RED/3D, book a call us and find out how it could help your team.
When you’re operating a modern-day data warehouse, documentation is simply part of the job. But it’s not necessarily the easiest or most logistically straightforward part of the process, while also being important. Documentation is, in fact, invaluable to the continued development, expansion, and enhancement of a data warehouse. It’s therefore important to understand everything that is entailed in adequately documenting, in order to ensure that your data warehouse processes run smoothly.
Understanding your Audience
One of the first things to understand is who you are compiling the documentation for. Support, developers, data visualisation experts, and business users could all be possible recipients. Before you answer this question, you really need to fully understand the way that your organisation operates, and open the lines of communication with the appropriate departments.
A two-way dialogue will be productive in this ongoing process. This process of communication will then help ensure that you keep the documents in line with the design. This is vitally important, as any conflicts here can render the whole process less than constructive than is ideal.
And it’s especially vital considering how fast documentation moves nowadays. Everything has gone online, and is based on Wiki. Whether it’s Confluence, SharePoint, or Teams, all sorts of Wiki documents are being produced by businesses with the intention of sharing important information. These shareable documents are updated with increasing regularity, meaning it is important to get your strategy in place before beginning.
Different approaches to data warehouse design can also impact the amount of time that a document is live before being updated. If you are lucky enough to make weekly changes to your data warehouse, you will be making incremental changes to the documentation itself. Development teams spend hours on updating the documentation rather than doing what they are good at….developing data solutions! Naturally, minimising this where possible is always preferable.
Self-Service Business Intelligence
Documentation is also crucial in self-service business intelligence. The integration of private and local data in this area, into existing reports, analyses or data models, requires accurate documentation. Data can be drawn in this area from Excel documents, flat files, or a variety of external sources.
By creating self-service functionality, business users can quickly integrate data into what can often be vital reports. Local data can even be used to extend the information delivered by data warehousing, which will limit the workload that is inevitably incumbent on data management. The pressure on business intelligence can be quite intense, so anything that lessens the load is certainly to be welcomed.
Another important aspect of documentation is that it reduces the number of questions that are typically directed at the IT and data warehousing teams. One thing anyone that works in IT knows only too intimately is the vast amount of pressure that can be heaped upon them by both internal and external enquiries. Again, anything that reduces this will certainly be favourable.
The data warehouse team also has huge responsibility within any organisation. They are required to produce a vast amount of information for front-end business users, and getting documentation right can certainly assist with this process.
Importance of Transparency
One important aspect of documentation that can sometimes be overlooked is the importance of transparency. This works on every level of an organisation, with the importance of sharing everything related to documents absolutely vital. Once this level of transparency is implemented, people who understand the data deeply can improve the documentation, or suggest changes to the Extract, Transform, and Load (ETL) and Extract, Load, and Transform (ELT), if this is indeed deemed necessary.
Conversely, it’s also important to understand that not all technology is suitable for documentation. As much as businesses and organisations would love this process to be completely holistic, this is not always possible.
Thus, packages such as Power BI, QlikView and QlikSense, and even Microsoft’s trusty Excel, are not necessarily ready to be documented. These software packages can use data, but often do not have the ability to provide a document set that explain how the data is being used, and for what purpose. Recently, Power BI has taken steps to ensure that the app can help with data lineage, but this remains better suited to IT teams, as opposed to Business Users.
Attempting to document data across multiple technologies is tricky, but Wikis can provide IT teams with the ability to collate all of this information into a central hub of knowledge, making access much more logistically convenient.
Conclusion
Ultimately, IT departments, data warehousing teams, and report developers should all be encouraged to produce documentation that contributes to the overall aims of their organisations. Anything excessively technical is not good enough for modern business requirements, especially considering the importance of communication, and of ensuring that everyone within an organisation is acquainted with as much vital data as possible.
Modern-day technology makes this goal a reality, and this means that it is increasingly an expectation of end-users. Failing to prepare properly in this area could indeed mean preparing to fail, as organisations will simply have failed to meet the compelling desires of the market. It is therefore vital for documentation to be dealt with diligently.
Getting this piece right, will go a long way to help with data governance!
If you would like to know more about how Engaging Data help companies to automate documentation, please contact us on the below.
Knowing how and where to find the needle more easily, and where in the specific haystack it resides
Big Data has been a hot potato topic for more than a few years now, and this phenomenon will play a central role in the future of commerce. Collecting, collating and comprehending Big Data will no longer be a matter of commercial interest; it will instead increasingly become a commercial imperative.
It should come as no surprise then that investment in technologies related to Big Data is already becoming almost ubiquitous. A report from NewVantage Partners, which collected executive perspectives from 60 Fortune 1000 companies, found that 97% of them invest in Big Data and AI initiatives. NewVantage also discovered that the vast majority of this investment (84%) was focused on deploying advanced analytics capabilities to enable business decision making.
Big Understatement
And when we use the term ‘Big Data’, it’s reasonable to conclude that ‘big’ is an understatement! For example, in 2018, Internet users generate approximately 2.5 quintillion bytes of data every day. That’s 912 quintillion bytes every year! And 90% of this data has been generated in just the last five years. The rate of growth and development of this curve is exponential.
Thus, it’s one thing to recognise the importance of Big Data, and quite another to be prepared for it. We’re talking about a veritable avalanche of information! In many cases, utterly unstructured information. Indeed, Forbes noted in 2019 that 95% of businesses cite the need to manage unstructured data as a problem for their business. Which, given the sheer scale of Big Data, is hardly surprising. Making the most of Big Data is not so much searching for a needle in a haystack; more like looking for a needle in a universe entirely comprised of haystacks.
This reality means that implementing the best business intelligence solutions will become essential. Dealing with the sheer volume of Big Data will demand this. And data warehousing is one element of this process that will be critically important. The analytical qualities delivered by this aspect of the overall Big Data management process will prove critical in the success of the efforts of companies to benefit from the information explosion.
Data Vault 2.0
That’s where Data Vault comes in. Data Vault 2.0 comprises a raft of sophisticated architecture and techniques that enable businesses to both store current and historical data in a singular and easily accessible location, along with the ability to create analytics based on this information. Data Vault is effectively a unique design methodology for large scale data warehouse platforms, ensuring that Big Data is dealt with more quickly, more efficiently, and more effectively.
Data Vault offers several advantages over competitors. The first reason for this is that it’s possible to convert any system to Data Vault determinations. This means that existing objects can be translated to Data Vault entities, and every single item will have a corresponding match in the new Data Vault architecture. Every main definition can then be mapped by hubs and every relationship between these via links. This means that the whole operation is more flexible and user-friendly.
Another significant advantage of Data Vault is its enhancement of agility. This is particularly important, as the ability of network software and hardware to automatically control and configure itself makes it easier to deal with the almost unfathomable scope of Big Data.
Smaller Pieces
Data Vault makes it possible to divide a system into smaller pieces, with each individual component available for separate design and development. This means every constituent part of the system can have its own definitions and relationships and that these can be combined at a later date by related mapping. This makes it possible to develop a project steadily yet still see instant results. It also makes managing change requests much more straightforward.
Another asset of the Data Vault approach is that it applies to numerous different systems. This means that separate sources can be transformed into Data Vault entries without any laborious procedures being involved. It is particularly advantageous in the contemporary climate, as almost every enterprise system relies on several different data types from various data sources.
The Data Vault modelling technique is thus adaptable to all types of sources, with a minimum of fuss. This makes it much more feasible to link different data sources together, making analysis more joined-up and holistic. It is well-known that being the entity that is the most adaptable to change is vital across a wide variety of niches, and this applies in the rapidly evolving data analysis environment.
But possibly the most compelling reason to choose Data Vault is that our offering provides companies with a method of standardisation. With Data Vault implemented, companies can standardise their entire DWH system. This standardisation enables members of the company to understand the system more easily, which is undoubtedly advantageous considering the innate complexity of this field.
Meeting the Needs
It is commonplace for complex and sophisticated solutions to be delivered to business users, which nevertheless fail to understand and adapt to the company’s actual requirements in that area. Everyone wants to show off their fancy piece of kit, but often developers aren’t as keen to listen! This can manifest for a variety of reasons. Still, the important thing to note is that Data Vault is designed to meet the requirements of the business, rather than requiring a business to reorganise itself to comply with the needs of the package.
This is important at a time when the dynamic complexity associated with data is escalating. Enterprise data warehouse systems must provide accurate business intelligence and support a variety of requirements. This has become a critical reality in a business marketplace in which the sheer volume of data being generated is overwhelming.
Data Vault solves these problems with a design methodology that is ideal for large scale data warehouse platforms. With an approach that enables incremental delivery and a structure that supports regular evolution over time, Data Vault delivers a standard for data warehousing that elevates the whole industry.