Whitepaper: Managing Analytics: Graph Model Metrics for Predicting Organic User Journeys
Updated: Jun 22
This article can also be downloaded here:
Context of Analytics Reporting
The reporting & analytics platform of a medium sized or larger company can represent a significant investment in an organisation or businesses in technology. In addition to the sizeable initial costs of design and engineering, it can involve ingesting and modelling large amounts of data, sizeable ongoing costs of ETL process and use of compute in queries, and last, but not least, an ongoing sizeable amount of time to create new reporting queries and insights utilising the tool.
Uncertainty of ROI
On the one hand, reporting and analytics represent a large investment for a company, but on the other, the investment is of uncertain ROI. There are many blockers standing in the way of getting a decent return on the investment. Some of these relate to:
· Poor or partial execution of the planned changes. (This might include data sources only partially or incompletely available to users, and lack of agility in providing new sources, etc.)
· Problems of reliability and quality of data. This is a common problem, either related to the source data ETL being incomplete or not modelled fully downstream, or else due to the reporting estate going out of date or not being well understood by analysts who then struggle to support it.
However, a large part of the remaining uncertainty about the ROI concerns:
· The delivery of new reports that save time in everyday operations, efficiently servicing requests for ad hoc queries, mapping and embedding reports in business processes and achieving high user satisfaction overall, and the largely successful use of reports by users.
This uncertainty about successful management of report delivery, analysts and report users is the ‘last mile’ of providing a ROI, but often lacks a clear methodology within the business for analytics managers to use.
Effect of ‘incomplete’ provision of report user stories in new self-service reports
The reason for the difficulty of this last step, basically comes down to the highly uncertain effects of providing more data to users. This is especially true if the new data is still incomplete for some user journeys. New reports for users to self-service are often expected to reduce the burden on specialist analysts, who might be suffering from too many ad hoc query requests. But, such self-service reports can have unpredictable effects because they can often present more data that can provoke more questions than answers to inexperienced users, since they still provide an incomplete picture for some user journeys. This may simply be due to the new ‘gap’ that is created by the new report being incomplete for some legitimate report user journeys. As a result, additional reports can actually create additional ad hoc requests to complete more user journeys. This can then eat up more time by specialist analysts or other parts of the business when the intention was to achieve the precise opposite. The intention is often to expand the user journeys covered by the new report, which is often a minimum viable product. Agile analytics development can fix this to some extent, but if the expansion of the report is slow or never gets done, the net effect can be negative rather than positive in terms of the burden on the analytics team.
Complexity of Data
Another issue with the last mile to providing a positive ROI is due to the inherent complexity of the data of medium to large businesses. The larger the business, the worse the problem gets. Attempts to provide more reporting on a subject area may create multiple reports with overlapping intended use cases. This can create confusion and potentially lead to longer report user journeys, or errors in terms of the results of those user journeys which go to the wrong reports and get the wrong information. This can happen even though each of the reports are giving accurate, reliable data. Modelling the risk of report user journeys going wrong and how they can create confusion rather than resolve it, is essential to be able to predict and minimise this risk. Report training can help but needs to be constantly repeated for new staff. Training is often not enough; If users are essentially inexperienced with the data, or report use is infrequent, then learnings are easily forgotten. Staff recruited for other skills may simply be unsuited for navigating complex data in self-service reports.
A double-edged sword: Providing reports for unknown use cases
Finally, reports may more easily generate work than save work because they can create new unknown, unintended and unwanted use cases, on top of the planned use cases. Examples are self-service reports that provide the ability for users to check the status of new information, which, if the new information is not as expected, can, in many cases, generate more work. This work is often simply to assure users of processes that don’t actually require further checks. For this reason, providing new reports tends to easily add to existing work by creating new work, rather than just save time on existing, known, use cases. The intention when building any report is to save on work that is necessary, but time consuming. But the unwanted use cases of extra work being generated by new reports from over-zealous users to check new things in new ways that don’t really need checking is a prime example of the ‘organic’ environment of user behaviour that can quickly shift the equation of the net value of a new report from positive to negative.
In this respect, report building is much like road building. It turns out that the more reports you build, the more net report and specialist analytics traffic can actually get generated because you lower the costs of ‘travel’ and encourage more report use. If the journeys are essential or net positive, that’s fine, but many non-essential and unnecessary journeys are also likely to be generated.
Capturing the last mile of ROI for Analytics requires organic metrics and models
To summarise: The key feature of the last mile then, is that ROI can be confounded, by unsatisfied, unexpected or unwanted, user journeys whose negative value can outweigh the value intended by any new report or data source added. The typical way that managers may try to deal with this, is to recognise anecdotal evidence that this is happening, and then try to ensure that training and instructions are provided to users. Managers may hope that this will keep the new user journeys to the value adding journeys only, avoiding the net negative cost report user journeys. Whilst this is good in principle, this extra work to train out and ensure and track compliance with the intended, versus actual, uses, is expensive and very time consuming. Further, it is an inorganic solution to an organic problem. Telling people not to do something they naturally are prone to do, is a bit like telling people to pay taxes. They tend to only do so if there are consequences for not complying.
Why organic user journeys are a large fraction of analytics usage
The unexpected or unwanted journeys are part of the organic process by which users generate journeys, according to their own priorities and interactions. Despite this, analytics managers are typically asked only to focus, when providing reports, on inorganic specified, designed user journeys which are part of defined processes of the business.
So, if a new system or operational source system is created, then some small number of processes and required user journeys will also be specified and a report will be built to service those requests.
However, organic user journeys are a significant dimension of reporting and analytics that is not typically analysed explicitly. By definition they are not captured in the requirements, since the requirements are the inorganic journeys, only; Those that are planned for and explicitly expected. To understand what is unplanned, we need to understand the root causes of these organic journeys and then work out how to model these unplanned user journeys as well. Only then can we capture the last mile success and get the ROI we wanted.
The root cause of organic journeys results from both the lack of control over users combined with the basic design principle of analytics systems. These design principles stem from relational database design and data warehousing principles which seek to satisfy whatever query someone can dream up about source data.
This design principle means that data engineers will typically model data without expecting to have the full requirements or user journeys known to them when they build. The reason this works is that relational databases and other databases can be designed to be extremely flexible in terms of the queries that can be generated. As a result, if the source data is modelled correctly an essentially unlimited, (though still bounded and circumscribed), number of queries can be written against that data model. Engineers then know that whatever reports are required from that source data can be delivered.
But what this also means is that we are actually naturally building systems for servicing ‘organic’ user journeys and ad hoc analysis, not just the specific user journeys defined by a few reports.
In this respect, all large computational systems are very organic: They can generate a huge range of potentially unplanned user ‘journeys’ in a similar way that an organic system can generate a huge range of biochemical reactions, many of which are not good for the organism.
This means that computational systems are fundamentally different to factories, yet managers tend to try to manage them more like factories, with models of only the planned deterministic user journeys, as if information is simply flowing down an assembly line.
In fact, large analytic systems can generate a potentially unlimited variety of ad hoc query requests to analytics specialists who can write custom queries. In addition to work generated that way, users who don’t have access to the underlying data platform and don’t want to ask specialists can typically still service a huge range of potential query requests by downloading data from different reports and combining them in Excel. All of this means that organic user journeys are a large fraction of the work done by analytical systems in a business, yet this work is difficult to manage and more difficult to predict precisely because it is unplanned, organic work, not planned, inorganic work.
Using Biological Research to Model Organic Journeys
The way to predict and manage the likely generation of such organic user journeys is to borrow from biology, which, as a science, is familiar with modelling probabilistic and organic processes. An example of such processes is metabolic pathways with typically include probabilities of reactions, and many reactions that are ‘negative’ such as the generation of free radicals, generated ‘organically’ in biochemical processes, and then often ‘mopped up’ by other processes before they can do too much damage. One major method of modelling such biochemical organic processes where we don’t expect or predict exactly which steps will be taken is called a Markov Chain.
Markov Chain, Source: Wikipedia
A diagram representing a two-state Markov process. The numbers are the probability of changing from one state to another state.
Another mathematical concept that is useful here is a Confusion Matrix. This is used to capture errors in steps taken to get to a result. These matrices are very useful in the design of machine learning, itself often a crude model of an organic process of improvement such as neural networks.
Multi-value Confusion Matrix, Source: Wikipedia
Turing Meta has taken this basic research and insight on the overlap between biological research and organisational analytics and translated it to create a model of organic user journeys generated by reporting using a graph. This graph can be called a Markov Model Confusion Network. Using this type of graph as a model, your business can, probably for the first time, predict the effects of changing the reporting estate, in terms of the organic, unplanned, unexpected as well as inorganic, planned, user journeys that will likely be generated.
Markov Model Confusion Network. Source: Turing Meta
A Markov Model Confusion Network for a small part of an analytics report estate in a graph database. Each edge is transitioned with a given probability. Querying the graph in the graph language Cypher generates the full distribution of probabilities and costs of organic users journeys and potential confusion, under different scenarios and conditions.
This model can show the predicted or actual effects of overlapping and similar reports in generating confusion for users and additional traffic and cost.
Another dimension of this same model can show how different experience levels of users will have different effects in terms of the likely mistakes made when traversing the reporting and analytics estate. This can help with modelling the likely organic effects of self-service reporting, when the user journeys rely on less experienced users finding the information they need.
How the model works
This model takes advantage of the fact that Markov processes of a reporting user journey don’t have to define any specific user journey that they are mapping.
Instead the model captures, approximately, the odds of all journeys being taken, with a probability distribution based on how closely different reports are related to each other in terms of the likelihood that data from one report generates a query of another report, or is related such that the two reports are likely to give similar answers to a query. These probabilities of given journeys between pairs of reports or systems can be generated by experts familiar with the relatedness of reports and also by utilising actual usage statistics.
If two reports are related, and both have high usage and are used to service more complex requests, the odds are that some significant fraction of organic journeys go to both reports rather than just one.
Confusion edges in the network are then a way of capturing the risk of looking at the wrong report at first, or looking at more reports than you actually need to. They are also based on the relatedness of the reports, but also on the experience levels of the users.
Together, the combination of Markov processes and confusion edges can therefore capture organic user journeys likely to be generated, on top of the capturing of the intended user journeys.
Return on ROI from the making the last mile
By also capturing the extra costs in time and potential error of those organic journeys, analytics managers in your business can use the metrics from such a graph to start to see and predict the effects of changes to the reporting estate. This can be seen in terms fo the effect on organic user journeys as well as the expected effects on inorganic, planned journeys. Together, these metrics allow intelligent decisions to be taken without unexpected consequences or decisions which then back-fire.
Some of those changes might be to make certain reports self-service, but if the data is too complex or still incomplete for a fraction of user journeys, then this can be seen in the model as lilkely resulting in more expensive user journeys overall.
Decisions can then be made about how and whether to make certain reports self-service, with more information and understanding of the risks and drawbacks that there might be to this.
Decisions can still be made to increase reporting but with greater awareness of the need to mitigate and monitor certain reports and their usage more closely, or the necessity of providing more complete data in early versions of the report as a more complex minimum viable product.
All of this can quickly lead to the last mile being achieved and businesses knowing that they are likely to be getting the ROI from their investment in analytical platforms and reporting, because they are also delivering on the last mile.
Contact email@example.com today, to discover how your company can benefit.
This article can also be downloaded here: