Introduction to the Lazy Stopping Modelling of Software Teams and Projects
Updated: May 27, 2020
5th May 2020
Part I: Frugal Cybernetics
The Lazy Stopping Model (or 'LSM') is a simulation that I am developing (currently in version 0.7) where I implement some concepts I am researching on the subject of risk management for systems of 'agents' and apply these ideas to software development and project management. (Throughout this blog I use 'agents' to refer to the individual entities involved in performing some activity). The LSM uses a general model I'm developing that considers the case of coordination of defined tasks between agents or phases that occur in a sequence. This type of situation also occurs in biology where examples are also useful to help with the development of the model. It also occurs in many work contexts where we can speak of a defined 'workflow'. However, the initial focus of the model is software teams and projects, mainly because these are hard things to model.
The differences between some other software development metrics versus the quantification of risk
There are already many ways of measuring and predicting the success of software development and projects. For example, some traditional metrics around software development look at the cumulative totals of work unit output, the average time that work units are in progress and the number of work items in progress. These metrics are related to Queuing Theory and other theories that were developed around the improvement of production processes, such as 'Lean' manufacturing methods, Kaizen, and even Pareto distributions of faults. In addition, some uses of software metrics include the aim of providing forecasts of effort required and likely performance of teams based on prior stats of that team's performance. This can include, for instance, the use of Monte Carlo methods to infer likely future lead times. More heavyweight prediction methods include Boehm's COCOMO (Constructive Cost Model) of estimation from his textbook 'Software Engineering Economics'.
Looking at cybernetics
In contrast to these other approaches, the LSM model is explicitly a ‘cybernetic’ model. Cybernetics is the trans-disciplinary study of control systems and communication that spans technology and engineering to the study of biology. Many examples in operations research and industrial systems depend on control systems and implement theories of control. In software development many of these ideas have been implicitly applied in the form of Lean manufacturing methods where it is assumed that we can achieve continuous improvements to processes via feedback from workers on the ground. Cybernetics is just the explicit focus on control systems and communication as a model of how to improve a process and respond to feedback quickly and accurately.
There is one key idea that is shared with the LSM model and general cybernetic models: This is the idea that we are modelling systems that are subject to feedback. Let’s consider what this means. In control theory, a simple controller in an ‘open’ system will merely respond to an input instruction and change accordingly regardless of further feedback. An example of a simpler ‘open’ system is where we have a car’s cruise control that automatically increases the throttle when the driver nudges the cruise control button to increase speed, but doesn’t take account of hills which would change the speed that actually results. A more complex system that responds to feedback is a ‘closed’ system. It will measure the result of the output action in response to the input and continue to adjust accordingly until the ‘correction’ required is zero. This difference allows for more complex and sophisticated control. A closed loop car cruise control will therefore monitor the actual speed that results and automatically continue to adjust the throttle until the speed is higher, say, even if one is now travelling up a hill.
The difference between open and closed-loop systems has implications for how we control a software development process and how we model it. We can, for instance, respond to tickets in an ‘open’ way and simply try to execute them as efficiently as possible and move on to the next one as soon as possible without checking the results. Indeed, throughput and lead time are important statistics used in software development metrics to help us to understand how well we are working in these terms. However, there is also the idea that we can try to build a closed loop system where we continue to work on something until it is actually providing value to the business. In this case, we depend on feedback from the business to appraise how far the tickets we are working on are adding value and continue to make changes until we have delivered something of value. This can be more expensive and unpredictable, because we continue to work on things until the expected result is achieved. We therefore might want to monitor feedback about the quality of software released and have a mechanism for doing so, ‘closing the loop’ continuing to work on changes until what is delivered has the value we hoped for.
Lazy Stopping Model as a ‘Frugal’ Cybernetic Model
The Lazy Stopping Model or LSM is an example of a model that seeks to simulate a closed loop system and at the same time implements its own model of what it means to be ‘in control’ of the software development process. It implements this by looking at two aspects of control: The first thing is the ‘inner’ control of the agents over the tasks they have to do as they respond to feedback about their progress on individual tasks. The second thing is the ‘outer’ control that the managers have over the resources and assets that the agents have been given to perform the task. So in a traditional closed-loop system we model feedback as key, but in the LSM we also consider the 'control budget' that agents have to achieve control, giving us a ‘frugal cybernetic’ model.
The Lazy Stopping Model and chains of coordination
In addition to monitoring control over the budget, the LSM simulates to what extent the agents inherit extra work from previous agent’s steps in the process, or conversely inherit more control assets to make their work easier. As we move along the chain of a process like progressing a software ticket, agents can make things easier or harder for agents that come after them. Thus, the control over things inherited by the agent from the previous agent’s work can be termed ‘control liabilities’ and ‘control assets’. For this reason, I can also describe the type of model that the LSM represents as a model of ‘Chains of Coordination’.
This complexity interacts as well with the budget available for control over each step. For example, if the control budget for one phase is too small, this can lead to extra work required for subsequent phases when the upstream agent isn’t able to exert sufficient inner control over their task before handing it over to the downstream agent in the coordination chain. So, these two extra dimensions obviously add complexity to the initial picture of a ‘feedback loop’. They enhance the detail that we can model in terms of the control we see that agents have over a software development process, a project of some kind, or any kind of workflow. Both workflow processes, software development teams and projects in general involve these chains of coordination and the need to be frugal over the budget to achieve control over each task which can lead to knock-on effects downstream in the chain of coordination. Many workflows that have to be frugal in some sense look this way, from ‘business as usual’ workflows, to rarer workflows we might want to analyse that might involve people applying policies to their usual work to be able to detect the risk of financial fraud, for example.
Therefore, by modelling these additional dimensions to control, this allows us to define and describe many different scenarios in which control is threatened by different kinds of risk to a project or development process. In fact, the study of risk is highlighted as a key aspect of maintaining control over software development and project management in general, and this is one of the major motivations for developing the LSM, so that we can understand and manage risk better.
Literature on risk management of IT projects
If we look at the at the literature on risk management in IT projects we see that the causes of failure highlighted by researchers and commentators include the inability to manage risk, either in terms of obtaining valuable feedback from stakeholders, through to outright management of uncertainty around project goals and execution (risk management). This argues for the explicit modelling of risk control strategies for projects and teams. For example, the census by the UK National Audit Office of 2006 into causes of project failure finds over 40% of respondents were either 'Fairly or Very Concerned' about 'Lack of effective engagement with stakeholders', (which presumably includes managing expectations according to risk and obtaining necessary information about what is required). In addition, over 70% of respondents were either Fairly or Very Concerned' about 'Lack of skills and proven approach to project management and risk management'. Another example is Joseph Gulla of IBM and his presentation 'Seven Reasons why Information Technology projects Fail'. Gulla lists 'Lack of Proactive Risk Management' along with 'Insufficient project communication' amongst others (effective communication is a key component to managing risk because it reduces uncertainty). Some of the other reasons listed are also related to information flows, such as: 'Failure to align with constituents' and 'poor performance measurement'. It follows from this brief survey that risk-based models of software development and project management methods can help to both inform and address the significant issues with risk management in these areas and move beyond traditional estimation and forecasting to explicit models of risk and the likely effectiveness of the strategies to control that risk. For these reasons the concept of the 'risk model' will be used to understand how to quantify risk using the LSM. The frugal cybernetic model of the LSM provides a precise way to model risk in software development or projects.
LSM risk model
The LSM 'risk model' of a project or team refers to where the team lacks sufficient information or control assets to control a set process or an outcome and the cost to the business of that uncertainty and loss of control
where the cost of obtaining that control over that process / outcome exceeds their ‘control budget’.
As both loss of control and cost can lead to costly uncontrolled results, loss of control (including opportunity cost*) plus cost to the business of control assets is the most proximate cause of risk in a team or project.
*By 'opportunity cost' I mean the cost of loss of opportunity to add business value, for instance, such as requirements that could be added at low cost and have high value, which were nevertheless missing from the software or project's original requirements list.
An example of loss of control is where one is unsure of the best technology to use for a particular aspect of software development, such as a choice of programming language. Or one might think that one is sure which language is right, but in actuality one might lack key information about the context for the choice that one is making, e.g. Details of what the language will be used for might be uncertain. This epistemic state of uncertainty can lead to an uncontrolled outcome and the choice of the wrong technology, i.e. the wrong choice of programming language, can be expensive. Similarly, one might choose a programming language based on the low cost of hiring agents who are proficient in that language. In that case you have exchanged the control asset of the superior language for further outer control over the control budget. This notion of control and associated cost of control is fundamental to the risk model of the Lazy Stopping Model.
Given any type of risk profile, risk is best controlled by strategy components that match well to have the greatest impact on improving that risk profile. A strategy component can do this in one of two main ways:
a. By increasing control over the quality of task outputs, or;
b. By decreasing the control budget, even at the expense of some direct control of the quality of output.
The aim of this risk modelling approach is simply to better understand the impact of strategy components on a risk profile versus if the strategy was not applied. A simulation such as the Lazy Stopping Model is ideally suited to doing that.
The intention of moving to a risk model viewpoint is that it then becomes possible to theorise and select strategies more scientifically according to the strategies that we know should be competent to impact some kinds of 'risk profile' more than others. So now I will now unpack this central idea a little, by explaining what risk profile we see in project work and software teams, generally.
'Knowledge work' results in 'process uncertainty'
Project work and software teams carry out 'knowledge work'. 'Knowledge work' is work that predominantly involves the generation of unique knowledge, intellectual property, or information in ways that are not entirely well defined or routine. In general, software is typically an example of the more creative end of the 'knowledge work' spectrum. Therefore, agents in software teams or projects experience 'process uncertainty' (i.e. lack of a defined routine or process) due to the nature of the 'knowledge work' that they produce. This in turn produces a characteristic risk profile for software teams and project work. Let’s examine this concept of ‘process uncertainty’ in more detail.
Process uncertainty, is where an agent allows that their best definition of where they actually are in a process (what is complete and what is incomplete) to be subject to change as new information (feedback) comes to light. In contrast, state uncertainty is where an agent chooses to interpret uncertainty as uncertainty about an outcome rather than uncertainty as to whether a certain stage in producing the best outcome is complete. Which state of uncertainty dominates is an agent's choice; a basic type of risk management strategy.
In short, given some uncertainty, an agent can choose to control the process but not the outcome, or vice versa, control the outcome but not the length of the process.
For example, if I am making a cake, I can decide to cook it for 40min, but then to not be sure how it will turn out, which I then call 'state uncertainty'. However, as an alternate strategy, I can choose to cook a cake and take the same uncertainty and convert it into process uncertainty by checking every 5min until I think it looks done. This, in turn reduces 'state uncertainty', but at the expense of 'process certainty'. Both strategies involve risk and which is better depends on the details of the risk profile to the task.
In the LSM model of risk we can see that losing control over the process to raise the chance of a controlled outcome means often agents risk exceeding their control budget. On the other hand, raising control over the length of the process and so controlling the budget, may also lead to a less certain outcome in terms of the quality of the task.
This idea allows us to define a 'risk profile', which is the kind of control that is available to the agent (e.g. the control budget), and the amount of control over a task that an agent is likely to exert at a given phase of the process given that budget. The best strategy achieves best ratio of control over the project or development process for the smallest control budget.
Control Assets: Technology as a strategy for controlling feedback costs
Viewing risk in terms of control and control budgets allows us to consider the concept of ‘control assets’, these are assets that can be invested in which give agents more control over processes and tasks. The wisest choice of control assets gives the most control for the budget. An example of such a choice is whether to use a typewiter to write a blog versus a word processor. As you can probably guess, the word processor is a superior control asset to the typewriter in most respects and allows us to achieve greater control for relatively little expense to achieve that control. In contrast, to achieve that same control with a typewriter takes a lot more effort and expertise by the agent, even though the control asset is cheaper, it doesn’t give you much control for the money.
Much of the risk management in software development is managed by the technology rather than the agents because powerful control assets are available to use. To see why, we should merely note that:
1. Control of feedback costs is the most important strategy for dealing with costs of control;
2. The strategy of controlling feedback costs is usually incorporated in some way into the design of information technology products which often serve as effective ‘control assets’ for software development.
Let’s step through these points: The word processor implements a strategy of controlling feedback costs in several ways. One way it does this is by allowing the user to make changes to words, fonts, layout, etc at any point during the production of a document. As soon as the feedback from the current sub-optimal state of the work is processed by the agent that feedback and regaining of control can be acted on at very low cost. In addition, the word processor has an undo function that allows for further means of cheap re-work. Unless you have a lot of Tippex, you are not going to be able to match the word processor in controlling feedback and re-work costs of a task like writing a blog.
Let's summarise the discussion so far:
1. Risk to some process or outcome can be managed by limiting the control budget or by purchasing effective control assets
2. The main strategy for achieving control using a control asset is to reduce the costs of feedback.
3. This type of solution is particularly important in software development where there is significant process uncertainty and so lots of rounds of feedback.
4. Lots of technologies, such as a word processor, a dedicated interface, a software development environment, or a design method are control assets that efficiently enable you to control feedback costs, mitigating process uncertainty.
Limits to technological control assets and the role of human agency
In this vein, this strategy of minimising feedback costs also drives and shapes the design of much of the technology used in software development teams and project work, generally, in order to achieve control. This is actually the same for ‘business as usual’ workflows too. In many cases this cheapness of re-work due to technology works very well, but in some cases, such as in the case of data warehouse design, the design of the solution itself may decide feedback costs to the largest degree rather than any particular choice of technology per se. Therefore, in the case of careful design, part of the strategy is to invest in agents which supply the relevant control over the design to control those feedback costs. Of course, in the LSM, the agent hiring policy is then part of the portfolio of ‘control assets’ that are considered and can be actioned for a given control budget. We can choose to spend a bit more on personnel as a control asset rather than technology. The basic questions then, in terms of managing risk frugally, are;
1. What control assets can you afford for a given process or project?
2. Given that a risk profile is the control required in each phase of a process and overall to achieve a desired outcome with a certain confidence:
3. How does a given risk profile determine the optimal portfolio of control assets which on average are the most efficient to use in each phase, and overall?
In part II of this blog post I will discuss in more detail how the control budget can be spent effectively to achieve a given outcome, and also look at the typical reasons why certain patterns occur of control change occur during the life of a process or a project according to certain risk profiles. I will also look at how the simulation highlights the basic strategy options to consider against the risk profiles of different business contexts and give some concrete takeaways to the reader for managing and calculating risk using this type of model. Finally, I will also be discussing the features of the future versions of the LSM.