Nowadays Data Analysts increasingly need to consume and process more and more data:

  • For decision making
  • To perform Machine Learning processes
  • To produce regulatory reports

So they need to know very well all the information available in their organisation, as well as the relationships between the different information systems, which are increasingly more varied.

On the other hand, there are situations in which some organisational processes need to consume new data, for example, regulatory reports to which information would have to be added if a new system is deployed. Traditionally this would involve the need to modify the process that generates the regulatory report to add the data from the new system.

One of the most useful tools to carry out these tasks is the Metadata, which provides the means for identifying, defining, and classifying data within subject areas and enabling users and technologists to manage the context and the content in information systems. The problem is that in most organisations they are either not available or very poor. Fortunately, organisations are increasingly aware of their usefulness and launch projects and even programmes to have the necessary Metadata for a robust base on which data analysts can work, and also to develop processes that generate information dynamically without the need for new software deployments.

The main change that these organisations face are:

Improving the governance: in some organisations there is not a clear definition of processes to manage the information they handle. There is not a single point of view of information, there are no definitions of the different fields in the databases and sometimes it is not clear who is the owner of the data managed by an application. As a consequence, it is difficult to find data, there could be duplicated data, some reports could be incomplete and in general, the organisation does not control its own information, which is a very high risk.

At the end of the day, the governance of the metadata needs to help the organisation to be aware of all the information they have, and to understand the impact of adding new information or new applications into the organisation.

Improving the metadata itself: it is very rare to find an organisation that has defined and identified all the metadata they need for their operations. So there is a lack of knowledge on the information they handle. To improve this situation, that can be done while the governance is defined, they should identify which metadata will be needed and create an inventory of all of them.

As part of this metadata inventory, it is important to describe the way to get the value of those metadata. Unfortunately there are some cases where you may know which metadata you need, but you do not know the values to populate those metadata. In these cases a deeper analysis is needed to realise the best way to get the values.

The more basic approach in this case, is to get the knowledge from an expert in the domain so they can manually set up the values. The problem is that, in general it is difficult to find an expert and it is a high time consuming.

Another option is to infer the values based on the information existing in the fields under analysis. For example, if you want to know the data type of a field, you can run some comparisons to determine whether are dates, string, numbers, … And a more sophisticated option is to use Machine Learning processes to find out those values.

Improving the understanding: Once the organisation has a clear view of the metadata and the data they have, it is very useful to build a semantic layer or and ontology on top of that to allow seeing across the whole organisation in a homogeneous way.

By addressing these challenges the organisation will soon see many benefits and realise how worthy is this investment.