Introduction to the Lazy Stopping Model Part II
Updated: Jun 2, 2020
Introduction This is Part II of the introduction to the Lazy Stopping Model, in which I explain some of the details of the theory of the LSM. I will begin by explaining how to analyse different strategies used in constructing 'coordination chains', which is a concept I introduced in Part I. The idea of ‘coordination chains’ is just the concept of a sequence of work done in several phases. This concept links the analysis of software projects, projects in general and working in teams, to sequences of workflows in biological systems and other systems and forms the central topic of my research. The LSM has been built at first, however, with reference to software teams and projects and this introduction continues by focusing on that subject. So, to begin the discussion of the details of the LSM I will first introduce the core concept of the detailed strategic analysis which is the concept of 'steps'. I will also explain why the model is called the 'Lazy Stopping Model'. I will then show how 'step prediction' and 'step analysis' forms the detailed strategic analysis used in the LSM and also clearly relates to the ideas introduced in Part I, where I introduced the idea of a 'frugal cybernetic system'. This is a system that analyses control using feedback and analyses the relation between 'outer control' (over the control budget) and 'inner control' which is the details of the control attempts such as a specific task in a software build process, or indeed in any phase of a specified workflow. Whilst, as I say, the examples will continue to focus mainly on software teams and projects, as in Part I, there will also be analogies drawn from examples from other domains, such as zoology, where this seems helpful. In the latter discussion I will discuss detailed tactics and strategies available to manage risk that I believe are best understood in terms of frugal cybernetics and the LSM. The likely size of the steps as the analysis We can think of many projects and processes as involving coordination over several phases by a single agent or multiple agents in a 'coordination chain'. When we look at the risk involved in coordination chains, we can further break down these phases into steps. Each step is an action taken towards the goal at the end of the chain. Sequences of sub-steps which form steps are the smallest unit of change tracked in the Lazy Stopping Model and each step roughly represents a unit increase in control by the agent over the process as they get closer to the process goal. The steps can vary in size, as a result of two things: 1. Different number of sub-steps 2. Different size of sub-steps This means to the agent that the amount of cost to your budget of achieving a unit of control can vary. The factors that predict the number of sub-steps and the size of the sub-steps is what the LSM models in order to predict both how much control is gained at any one moment, and how much the control gained actually cost. The risk of steps that are too large For example, consider a software project that involves the step of building a specific scripted function as work towards satisfying some requirement in the project plan. That requirement actually itself requires several steps of which let's say building the scripted function may be one, testing the function a second, and releasing it is a third step, etc. One of the questions we can ask in terms of risk management is how we avoid creating steps that are individually, or on average, too large (i.e. too costly in time or resources) to be successfully carried out. That is, if the cost of the step far exceeds the budget for achieving that desired step. For example, in a software project, what if it turns out that to complete that script you have a sub-step where you need to import a library which only exists in a later version of the programming language you are using? Then it might be the case that an upgrade of the programming language that is needed to system servers is also complex and delayed by other dependencies and the workload of the team that is needed to do this. So, this scripted function has become a very big and costly step in the overall set of steps needed to continue the project. The main reason that the step has got larger is two-fold: Firstly, the number of sub-steps has increased and secondly, the size of some of these sub-steps are large. In addition, these sub-steps may not have been predicted in the plan, but were actually always a risk. The LSM is therefore about capturing ways to predict the likelihood of step sequences and step sizes and then analyse this sort of risk and manage it appropriately by various tactics and strategies. Analysing step size in the LSM The focus in the detailed LSM analysis is, therefore, analysing the risk of given sub-steps occurring in a process and their potential effect in terms of their size on the size of the overall step. The LSM simulates the odds of a given sequence of sub-steps in a step and how large the steps, for a given process or project, might to turn out be. On that basis, it determines the risk that the process exceeds the control budget for the functions that are likely to be delivered. The LSM is, therefore, mainly aimed at analysing and predicting what determines the odds of a set of steps having a certain size distribution. This is because all the steps that end up being required use up the budget. One of the things we can see with this analysis is that there is often a risk that a very few unexpected, very large, steps can end up using more of the budget then many smaller steps. This problem is known as a 'fat-tail' distribution, where one unlikely but very significant event can skew the results, and is related to the 'Black Swan' effect popularised recently, by Nassim Taleb. An example in a project might be the risk that there is a last-minute disagreement over the legal contract for the user agreement, just at the point that the project was due to go live. This one step then pushes out the time before a large-scale release by 6 weeks. So due to that one unexpected lengthy step, the project may have increased in the length of time to full release by 25%. How large is a step for you? Interestingly, how large a step in a process is has a lot in common with physical steps we take: What constitutes a large step for someone depends on who is taking that step. It is 'relational', but not 'subjective'. So, if we consider a staircase then whether the individual steps are too large clearly depends on (is related to and indexed against) the size of the person being expected to climb it. Similarly, in terms of risk and budgets a poker player finds that the risk to them of losing a particular hand of cards is largely determined by the size of their whole stack of chips relative to what is at stake in that specific hand. The 'chip leader' may see no risk in the same bet that is highly risky for someone who has a small set of chips left even if they had the same hand and odds of success. They are like differently sized people when faced with a large step on a staircase. It is the same for software projects: Those with small budgets can only afford steps to be a certain size, which means they can only afford to risk a certain cost in time and person hours for any one step. Step risk control strategy One key trade-off in the analysis of step size and risk is between the outer control budget and inner control budget, and this idea was discussed in Part I. Let's see how this trade-off works: As an example, let's imagine you hire agents to work on your team that carry out steps very efficiently and reliably, thereby keeping the chance of longer unnecessary sequences of sub-steps occurring pretty small. Let's say that this is because there are less rounds of testing and re-testing necessary, i.e. fewer bugs. This means they have a high degree of inner control of the tasks as discussed in Part I, but note that these experienced team members also cost more, so the outer control budget is now smaller when you begin the project. This dent in your control budget means you start the project with a smaller 'stack of chips' and you are therefore now more sensitive to any larger steps than if you saved some money and hired less experienced staff. We can see, therefore, how the frugal cybernetic model gives us the means of being holistic about risk. Summary so far Spending budget on inner control also limits the budget for step sizes in the project, so you had better hope that the more expensive, more efficient agents you hired, give you the reward of reducing the step sizes using their greater inner control efficiently enough to make up for their cost, and so enable you to retain the outer control over the budget that you want. [Photo: James Robert White]
Lazy Stopping Model name and the risks with inexperienced agents One of the things I noticed early on in my research is that agents sometimes think they have completed a task when, in fact, one or more things in the task are incomplete. One way this can happen is that you hire inexperienced agents to do the work. I call this 'lazy stopping' which means the agent is stopping (i.e. completing) the task before she should have. I use the generic term 'lazy' to capture all the reasons (which actually might not be the fault of the agent at all!) by which a step is completed 'lazily' i.e. incomplete. The result of this is a sort of 'broken step': Whilst the task was thought to be complete we will carry on working in the next phases and will 'pause' more progress on this outstanding task until we eventually realise that a step was not done and then begin working on it again. Only then will we be able to complete the step. This is, therefore, an important way that steps can become much larger, through becoming 'hidden tasks' that we thought, incorrectly, were already done, until we find out otherwise. Broken steps involve 'unknown unknowns' The LSM therefore also models these types of events in particular, and they can be sometimes thought of as 'unknown unknowns' since the agents working on a project think the previous steps were all complete and don't know some are incomplete, as opposed to tasks where they know they haven't done the tasks and can work towards them in a more organised way. So, the name, the 'Lazy Stopping Model' refers to the idea that, if 'lazy' or, even through no fault of its own, an agent misses some necessary steps, these steps then get 'paused' until they are re-discovered and then completed much later than necessary. For example, some requirements might be missed in the requirements gathering phase, and then only added when it is more expensive to do so towards the end of the project. In this way, larger steps occurring through 'lazy stopping' is a key way that a project can exceed its budget and become a liability or at risk of failing. The risk of this happening as well as all the other factors that affect the distribution of step size is essentially what the Lazy Stopping Model aims to predict. The LSM uses this 'lazy stopping' concept to help model risk due to uncertainty in particular, as well as modelling the control budget constraints. As you might be able to guess, the risk of agents paying more attention to the outer control budget, and missing steps (lazy stopping) is one important way that the frugal cybernetic model of inner and outer control interacts with uncertainty . It is much easier to miss things if they are more like 'unknown unknowns'. This often gives both an 'outer control dividend' (i.e. You seem to do the same work in less time), and an 'inner control liability' because, in actuality, there is more work for people to do downstream to complete, i.e. the 'broken steps' that you have paused/missed. Technical details of the LSM So, in the LSM there are different sequences of sub-steps that we model that occur with some probability and for each kind of sub-step sequence we then also, separately, consider the size of each sub-step. When we get the product of the number of sub-steps and the cost in time and resources of each sub-step in that sequence, we get the step size. The kinds of sequence are referred to in the model in these terms: Types of Step 1. Go – Where the step is successfully completed without additional unnecessary sub-steps. 2. NoGo – Where the step is successfully completed only after some additional sub-steps. 3. NoisyGo – Where the step is marked as successfully completed but in fact misses some sub-steps which will be discovered later and done then (‘lazy stopping’). 4. Won't Do – Where the step is not done to avoid exceeding the control budget. So, each of these is involved in different types of sub-step sequences and in the LSM it generates a sub-step sequence according to probability and various inputs for a given step in a task (a task consists of many steps and a step can consist of many sub-steps). A task might be 'Build Backend to Dashboard'. This task might consist of 20 steps. The LSM will predict according to many factors and according to probability how many sub-steps each of these 20 steps turns into, and then the size of each sub-step to give the overall cost to the control budget of that task. Sub-step Sequences Confusingly, perhaps, sub-steps are also named using the same terms as steps themselves. So a 'NoGo' step might consist of a sub-step sequence which starts with a 'NoGo' sub-step and ends with a 'Go' sub-step. E.g. The sequence might be 3 sub-steps like this: 'NoGo>NoGo>Go'. It would signify two additional sub-steps were needed before the overall step was completed. But because it has some 'NoGo' sub-steps, we term that sub-step sequence a 'NoGo' step overall, as it is a step that involved some 'NoGo' sub-steps before it was completed. On the other hand a 'Go' step never contains any 'NoGo' sub-steps and only consists of 1 sub-step, which is the 'Go' itself. There were no additional steps so we call it a 'Go' step and we know that a 'Go' step is more efficient than a 'NoGo' step. A sub-step sequence counting as one step can be a series of zero or more NoGos or NoisyGos, ending in either a Go or Won't Do. For a 'Build Backend to Dashboard' with 20 steps we therefore might get an output like this:
*The step size of each step will then depend on the separate calculation of the cost of each sub-step in each step's sequence, which can vary depending on when the step is done, the technology and organisation used, etc.. **The sequence that begins with a NoisyGo is a 'broken step'. The NoisyGo denotes that the task is wrongly marked as 'complete' by the agent and then is 'paused' until later phases or tasks when an agent then discovers that it still needs to be done. To make these calculations we would estimate how many steps fall into which 'sectors'. This is another way of classifying types of step. For example, for the 'Backend Development' task, we might have several sectors: One sector, for example, might contain all the steps that involve connecting to various databases for a given task. We would then use estimates of the risk commons to a given sector to determine the sequences that are most likely, given the contributions from the systems and the agent's own contribution to that likelihood. For example, in some sectors that are highly dependent on other teams we might find that even simple steps are likely to require coordination between teams that are likely to be bottlenecks. This will lead to longer step sequences. Each 'NoGo' in these sequences would represent, in this case, a dependency on another team that must first complete some work after the agent has informed them. We would also represent the size of these 'NoGo' sub-steps as being larger if that team is likely to be busy with other work and so not prioritise this work. Modelling Risk Strategies In the final section I will move on to consider the four different kinds of risk that need to be managed which can be captured using this detailed LSM type of analysis. There is tactical risk, goal elasiticty risk and two further kinds of strategic risk to consider. Tactics for managing the control budget The first type of risk management to consider is tactics that individual agents can adopt for managing risk to the control budget (which is outer control) and balancing that against the risk to inner control, for the task and phase that they are in control of. Each agent has to decide whether it is better to try to give an outer control dividend (save on cost/time) or an inner control dividend (improve the quality of some work) to agents downstream in the coordination chain. An example of this can be taken from zoology: We can consider the lioness and her goal of catching a wildebeest. In the first phase of her pursuit the experienced lioness's tactics involve being careful to close the distance to the wildebeest before beginning the actual chase, this is known as the 'stalk phase'. To give more of an 'inner control dividend' to the downstream ‘chase phase’ she must try to subtract a metre at a time from the initial chase distance. This is the rough metric of adding a unit of control, which is a 'step'. To add a step however varies in the length of time. She can't move when the wildebeest is looking at her. In the LSM that sub-step where she waits results in a 'NoGo' sub-step in the sequence for that step. So, for the stalk phase some steps involve a sequence of 'NoGos' before the 'Go' of the lioness moving a metre closer to the wildebeest. That means that in the stalk phase each unit of control gained by the lioness varies in the time taken. However, an experienced lioness knows that in this phase the goal is inelastic to time. This is just an economist's way of saying she can be patient, because more time spent in a given step doesn't really cost her anything. So, we say that in this phase where time is 'cheap' she is far better off giving an inner control dividend rather than looking to give an outer control dividend. However, in the chase phase the tactics change completely: In this new phase time is expensive because the chase is metabolically costly for the lioness. So, she is looking to end the chase as soon as possible and each step should be as fast as possible. In this phase the outer control budget giving a dividend is the same thing as giving her the reward that she is after. This analogy also works to consider tactics in any workflow or project that change according to whether we assess time is ‘cheap’ in a given phase, or not. In the early pre-planning stages of a mooted project that is not yet certain to go ahead, time is cheap as resources are spread quite thinly; agents perhaps working only part-time on the project. This is like the stalk phase of the lioness pursuit. As time is cheap there is a lot of value in adding extra sub-steps here as one can then close the ‘starting distance’ when the project is actually given the ‘green light’ (just like the lioness’s chase phase). At the other extreme, where time is expensive, there are time-critical deliveries, e.g. launching new software at a trade fair. Here, getting the outer control dividend is almost the same thing as the value we are after, just as it is for the pursuit phase for the lioness. If we get the tactics wrong and assess the outer control budget as being primary when it isn't, we are more at risk of failing to do something valuable due to loss of inner control. For example, an inexperienced lioness will try to complete a new step in the stalk phase when the wildebeest is still looking in her direction. For the LSM that is a sequence that begins with a 'NoisyGo'. In the lioness’s case she finds out immediately she has made a mistake as the wildebeest starts to run. The effect is the same however, in software and workflows even if we don't know the problem straightaway, because we can say that due to a NoisyGo step, just as for the lioness, in the next phase there will be a significant control liability meaning our distance to our goal is now larger than we would like. Calibration of value For software projects and workflows generally, we often consider what metrics we are trying to measure in terms of getting closer to our goals. For the lioness it is just the remaining distance to the wildebeest. For software it may be usage of the software product that we are trying to increase. By calibrating what a unit of control (a step forwards) gains us, we can start to give estimates of the value of our control budget, and decide if we are getting value for money. But, in the LSM we can only calibrate what one unit of control is worth in a given scenario based on the average outcomes of many runs of the simulation for a certain set of parameters. Only then can we see what value was returned for a given budget, according to the simulation. Yet, despite the limitation of the theory, in the sense that it relies on simulations for completeness of the equation of budget versus the value, this type of LSM analysis still allows us to say some useful things ‘up front’, i.e. before doing any simulations, in terms of tactics about the best way to move towards our goals. I now want to say a few things about what the LSM theory of frugal cybernetics says about the best strategy options that we have to manage risk. Goal elasticity Firstly, notice, that we can consider some goals to be 'elastic' which means that the value changes dramatically based on the inputs of work to achieve it. The closing of the distance to the wildebeest is an example of that because if the lioness doesn’t pay close attention in the stalk phase a 'NoisyGo' results. Other kinds of goal are more inelastic, in that they are less sensitive to the inputs into the task, they still get done. Anything we can do to make the goal less sensitive to inputs is better for us. This is where we want to get to. Goal elasticity is bad, and goal inelasticity is good. Examples would be reducing 'hand-offs' in work, automating process that were manual, raising odds of shorter step sequences and reducing step sizes. Note that hiring more experienced and expensive staff is also controlling the inputs and can make outputs less sensitive to other inputs, but also implies that the goal is intrinsically still sensitive to that difference of greater experience. Recall that spending more on staff also leaves you will less control budget, which makes you more sensitive to their inputs, too. There are no hard and fast rules here, but the discussion gives a flavour of the trade-offs that we need to balance to manage risk. [Image courtesy jamesrobertwhiteart.com]
Dealing with uncertainty The second type of risk management tactic is specifically to deal with uncertainty. A notable tactic for this is to work defensively and to expect inner control liabilities to come from upstream and to try to detect and fix them. These liabilities are a risk when it is likely that upstream agents were partly 'lazy stopping' and this is often due to 'unknown unknowns' on their part. In the LSM these are the step sequences with a NosiyGo in the sequence, so the step is paused until someone discovers downstream that the work wasn't actually done. Like defensive driving, the tactic of defensive work involves making the appraisal that people are likely to make mistakes. In order to achieve detection of work undone this implies that downstream agents can recheck work of upstream agents, which implies that they are not entirely specialised but can do a decent job of partially 'doubling up' to cover of the work of other agents. Of course, this can also imply that they are more experienced generally, which is not good for the control budget, as that is expensive. Other ways that we can provide coverage is to work in ways that lead to more overlap and communication to create the higher chance of double-checking people's work due to the 'over-spill' of information and conversations. This can be achieved by having teams work closely across different tasks, but often that too is more expensive. Still, we can model the risk and the difference with teams that are more disparate, geographically, or even within the office building. Technical note (can be skipped) In the LSM, the tactics chosen and adopted, influence when these 'broken steps' are detected downstream in the simulation. Individual steps come from specific sectors and the LSM models different probabilities of detection based on the sector that the 'NoisyGo' occurred in. For example, if an agent doubles as a business analyst, then that agent has a higher chance of detecting requirements missed downstream as the 'requirements' step sector overlaps with their business analysis skills set. Moving from open to closed systems Another key way is to manage risk due to uncertainty is to ensure that one gets enough reliable feedback. That means going from an open system or a system that is open in some areas to a 'closed' system. I explained these terms in Part I. The terms 'open' and 'closed' are from the theory of cybernetics and refer to systems that respond to feedback (closed) and continue to work until they reach a goal, versus systems that are quite 'open' in that they follow an instruction regardless pretty much of the value of the result. For risk management the LSM approach shows the importance of moving to a closed system, because the risk being managed includes both meeting the control budget (outer control), but also includes meeting the desired quality of work (inner control). Knowing the balance between the two requires moving towards a closed system so that you can measure and deliver on outcomes as you build. Note that moving to a closed system also often implies sometimes using longer step sequences as we continue to work on things until they actually provide the target value. Or it can mean reducing the value delivered to meet the feedback on using too much of the control budget. So, in fact, in a frugal cybernetic system, managing both inner control and outer control can only be achieved in a closed system where the true costs and value of action choices are understood and fed back into the system. This means that the main strategy to enable agents to balance inner and outer control is to reduce the sensitivity of one to the other. This is called increasing goal 'inelasticity', and it means adopting strategies to ensure the outer control budget is less sensitive to required increases in inner control and vice versa. Less goal elasticity means limited impact on outer control from gains in inner control By using appropriate technology, design and organisation we can see that in the LSM this can translate into lower sub-step sizes even for longer step sequence lengths as we seek to ensure we gain that unit of control. I discussed in Part I the difference between control assets like a typewriter and a word processor. This is because the choice of technology can often give differences in step size between a sequence of corrections such as when using a typewriter versus that same sequence of corrections using a word processor. This highlights the value of technology (which I call 'control assets') and 'design and organisational choices' (which is an agent contribution) in delivering projects by making outer control less sensitive to inner control dividends. Less goal elasticity also means smooth decay of inner control as we gain or maintain outer control At the same time, adopting goals like continuous delivery, we can adopt goals which smoothly reduce the value delivered when we come across extra steps we can’t deliver. We seek to ensure we don't exceed the outer control budget by not doing all of those steps but we still achieve some of our goal. An example would be having value measured by targeting the amount of usage of a system, rather than targeting a specific release of functionality to be delivered on a certain date. If we don’t deliver everything planned we will still get some additional usage from what we deliver. Getting additional usage isn’t an all or nothing goal. To return to the lioness pursuit example, bringing down a wildebeest is an all or nothing goal, and so the output value is highly elastic to inputs. Fortunately, we are rarely involved in truly goal elastic systems, but unfortunately, too often, managers can behave as if the goal is far more elastic to inputs than it really is, giving artificial deadlines, which lead to unnecessary loss of inner control, and problems like technical debt, which is a particularly expensive kind of inner control liability. 'Closing' systems In this section I also want to discuss an important class of system which is 'closing' systems. These are systems that only increase in feedback reliability and volume as the agents reach the latter stages of the workflow or project. This is a situation familiar to software design and engineering projects, generally, because it is only as you execute your planned design that you see the actual effects in enough detail. Only then does the reality of the choices you made become clearer in terms of issues, side effects and bugs, caused, etc. The effect of 'closing' occurring over the latter phases are that problems are only discovered (LSM 'NoisyGos') when more things are delivered, tested and released. The LSM offers the same sort of analysis for how to manage the risk in these types of system as closed systems, which is to make the outer control budget less sensitive to gains in inner control, but to concentrate on doing so, specifically, in these later phases. Therefore, as step sequences get longer in the latter stages, we seek to adopt design choices and technology and organisational choices that lower the step size of these longer sequences. We seek to do this especially in the latter phases of the project, such as release phases. To consider what the implications of 'closing' systems are, we also need to contrast a closing system with a system that is already closed. A closed system gives reliable feedback all the way through the process. When unit testing, it would involve assuming that nothing is missed and all bugs and issues are captured. This sort of assumption is often implicit in 'Waterfall' methodologies where the risk of very late discoveries of bugs tends to be downplayed in estimation. This assumption of a closed system may be true in certain circumstances, but when uncertainty is higher, we know that the system is not closed but closing and the more reliable feedback will be available only in the latter stages of deployment. This implies that there are inner control liabilities in the form of 'NoisyGos' which are inelastic to detection attempts in the early phases. The implications if you are involved in a closing system are that, beyond a certain point, it doesn't pay to keep extending the step sequences in the early phases in order to try to give an early inner control dividend. This is because you will not see any more reliable feedback in that phase, and you will not avoid any more 'NoisyGos'. Instead, in these scenarios it pays more to get to the later phases earlier by paying an outer control dividend in the early phases, so that you have a higher control budget to deal with the inevitable feedback and longer step sequences in the later phases. This is a key strategy that is effective in a closing system. I call this strategy which is often adopted emergently rather than with clear-sighted decision-making, 'purple-shifting' because it involves shortening the 'wavelengths' of earlier tasks and accepting there will be 'NoisyGos', while lengthening the 'wavelengths' of the later tasks, which are not really 'lengthened' but are unavoidably long in a closing system. By reducing the step sizes using technology, design and organisation, we can limit the actual step sizes in these latter phases and achieve the control we are after.
Permanently 'open' system dimensions
In this final section I want to address the scenario where the system remains 'open' in certain dimensions. Rather than being closed or eventually closing in all dimensions, we can also consider systems that remain permanently open in certain dimensions and do not respond to or obtain feedback in those specific dimensions at any point. This means that the agent always gets feedback that is incomplete or missing some dimensions of information. This is an important scenario for frugal cybernetic systems where we assume the budget for control is limited, because we should expect that agents on a tight control budget are forced to pick and choose which dimensions of feedback to be able to detect and explicitly manage. That limited control budget might mean some dimensions of control are missing altogether. The management of this type of risk has an interesting solution. One type of analysis that involves a solution to this type of decision-making problem is called 'Robust Decision Theory' which is a theory of how to calculate the minimum losses we would get given the likelihood that at least some of the information we have is likely to turn out to be wrong or incomplete. This can be thought of as the agent bet-hedging against its own decision making. I believe, however, that there is another important way to answer to this type of control problem which is less complex than Robust Decision Theory. The alternative is simply to introduce inefficiency in the system that randomly 'frustrates' an agent's ability to get its own decisions reliably converted into action. This sort of bet-hedging is something that happens naturally in a team or project work because managers rely on agents carrying out their instructions for their decisions to be executed but agents carrying out instructions imperfectly respond to the manager's instructions, perhaps because they have their own point of view and their own information and ideas. When agents sometimes don't follow manager's instructions, we effectively get the bet-hedging of that manager's decision-making process for free. This type of theory is important in economics, because like the 'invisible' hand of the market, this type of strategy emerges freely without planning. So, by recognising where this emergent phenomenon has value, we can take a more enlightened approach to our risk management and planning, merely fostering and protecting the naturally existing ‘contours’ of a team and how it works, rather than trying to explicitly engineer risk management from the ground up.
[Photo: James Robert White]
Examples of natural bet-hedging in software teams and workflows
I have thought of at least three examples of how this natural bet-hedging can occur to insure agents against their own biases in permanently open system dimensions.
Natural bet-hedging against imperfect prioritisation
The first example is where we have a system that prioritises tickets in a continuous workflow scenario but according to inaccurate or incomplete information. So, let's say that we know that in fact, in this system there are lower priority or no priority tickets that are in fact of higher value than at least some of the high priority tickets. In such a system, natural bet-hedging occurs when there is significant inefficiency in the system, such as bottlenecks. This frees agents to work on the tickets that they can actually complete while waiting for the bottleneck to clear. Some of these completable tickets may be the low or no priority tickets that are actually of high value.
The converse theorem to this natural bet-hedging example is that if we raise the apparent efficiency of the workflow so that only the high priority tickets are worked on we risk systematic biases being introduced into what we work on, which can accumulate in effect over time. For example, we might find that management systematically prioritise management reports to be worked on over operational reports. Hence, this gain in efficiency actually raises the risk to the system of significant inefficiency and cumulative control liabilities emerging in the types of tickets that the managers are biased against.
Natural Bet-hedging against lazy stopping of individual tasks
The second example is where an agent is working on a task, and believes that the task is incomplete. In reality, the task is a 'NoisyGo' because the agent has incomplete information on whether she has completed the task. However, let's say that rather than being able to transfer the ticket to the next agent for the next phase, there is a bottleneck in the system which means the ticket is 'blocked' and effectively remains in that time with the original agent. In this extra time that the agent didn't ask for, she has time to discover the 'NoisyGo', and so complete the work, saving time in terms of the reduced step sequence length and step size of that task. This type of scenario is I suspect quite common, and indicates that bottlenecks form a natural bet-hedge that emerges against the agent's own imperfect decision-making, which again can be quite biased due to her limited control budget.
Natural bet-hedging against imperfect/over-specialised task details
The third example is where an agent is in a situation that allows it to pick up on work missed by not working exclusively on its official specialised role. Agents that don't follow instructions in terms of a narrow focus only on their official work but due to their own ‘inefficiency’ can be more prone to straying, in terms of their interest, into other areas of work. As a result of this inefficiency, they are capable of picking up work that is missed or opportunities to add value. Anecdotal examples abound of water cooler or coffee machine discussions or chats with the other smokers outside that lead to valuable insights into some business issue that has until then been overlooked. This sort of unofficial route to gaining useful business information to a team, workflow or project, performs natural bet-hedging against the official channels of communication.
Natural bet-hedging of this type can be fostered and encouraged by social event organisation, and by the provision of shared spaces. It can also be encouraged by increasing the odds of contact with other teams, team members, and so on, using seating arrangements to encourage this sort of communication over-spill between different teams and functions. It can also be encouraged by hiring agents who naturally are curious, gregarious, and have overlapping interests and skills.
So, in summary, over the two parts of this introduction, I have introduced the concept of 'frugal cybernetics' which is the idea that we are trying to achieve control on a budget. This is applied to the idea that we can better manage risk in software teams, workflows, projects when we consider them as examples of coordination chains in which control passes form phase to phase, sometimes with a control dividend and sometimes with a control liability. In Part II, I applied the theory to the Lazy Stopping Model to simulate risk in this way. I showed how we model the effect on inner and outer control and the options available to agents in terms of steps and sub-steps which form variable length sequences to achieve a unit of control. In the second half of Part II, I also showed that there are basic tactics determined by which phase you are in and how time-sensitive the goal is, in that phase. I also explained that there can be defensive tactics to insure against receiving control liabilities from upstream choices by agents. Then I discussed how we work to create closed systems as these are the systems where the agents are aware of the level of control that they obtain for their control budget. I also discussed in some detail the concept of closing systems, and systems with in-built biases because they are permanently open in some dimensions. I finished by introducing the idea of ‘natural bet-hedging’ as a freely emergent property of ‘inefficient’ teams to make decisions more robust to uncertainty and imperfect information. In the rest of this blog I will be exploring how these ideas work in more detail and sharing ideas about how they apply to other examples of coordination chains in the case of biological systems.