In my previous

blog I speculated about the practical
application of complexity sciences to assist in business management. In this
blog I demonstrate how we use complexity sciences to create additional
quantitative insight into business systems to support our mission at

IAM of “To see
better. To understand better. To do better”. As an entrepreneur, I have learned
over the years that investments go hand-in-hand with risk – understand risk in order to mitigate potential issues and
subsequently gain on system performance.

The

Anscombe quartet consists of four data sets which share similar statistical properties, but when
visually inspected shows different patterns. I provide the visual
scatter plot at the end of the blog to support the use of complexity science in discovering better insight into system behavior.

Complexity is hard to define, but
in simplicity I can state that complexity is the result of a combination of uncertainty, volatility and relationships in a system. This
also underpins the key requirement for any complexity science approach - having the ability to
measure in a non-supervised fashion, entropy and chaos in a non-linear manner. In the case of the Anscombe quartet, the observations are 11 – not enough
to create meaningful insight into characteristics such as robustness,
self-organised criticality and small-world behaviors, but still powerful enough to demonstrate
the concept.

**The Anscombe quartet**

The Anscombe quartet
is 4 data sets which shares common statistical attributes. In this example the
four pairs can represent 4 business units, departments, products or processes
with 11 observations.

Descriptive statistics show
similarity between the X and Y pairs, with similar correlations between the
variables. On face value these 4 units operate and perform in a similar fashion.

**COMPLEXITY SCIENCE INSIGHT**

What else can we see or understand from this system ? How will we go about to optimize this system ? What is the overall risk of this system ? If we optimise this system, where should we start ? What risks do we face if we do this ?

With complexity science we can get quantitative answers on these questions. To summarize the following sections we can state the following about the system as derived from quantitative analysis:

" This system is fairly integrated and shielded from either random failures or structural failures. This is caused by the high interaction of system components between all business units. However, performance optimization of this system do pose a greater challenge. From the quantitative insight the strategy should be to address and correct the high uncertainty of x1,x2 and x3 performance. The cause-effect model of the system should be used to study the impact of changing x1,x2, and x3 to understand what implications will effect the overall system. Improving x1, x2, and x3 will have significant impact on performance indicators by reducing customer lead times, work-in-progress and waiting times. It is important to protect the stability of BU 4 (x4 and y4) as they have significant impact on 60% of all observations".

**ANALYSIS: VOLATILITY, UNCERTAINTY & IDEAL CAPACITY**
__Volatility__ = measure deviation from the indicator mean. Between 0-50 % indicates a relative

stable indicator, between 51-100 less stable, and above 100 highly volatile.

__Uncertainty__ = indicates the level of uncertainty in the indicator measurement, with 0%

no uncertainty, and 100% total uncertainty in the measurement.

__Ideal Capacity__ = ideal capacity required to process indicator values according to the level of uncertainty in the indicator values at the selected frequency of observations.

**ANAlysis: Systemic Risk Score**

__Complexity Score__ = Relative value calculated based on the current measured complexity within range of the minimum and maximum complexity of the system.

__Systemic Risk Score__ = Value between 0 and 100 %. Complexity is a result of the volatility in behavior of objects within a system, and the level of uncertainty in the relationships between these objects. For a given system 100% represents the maximum risk due to complexity – this is an absolute value and can be used to compare different systems against each other.

**Analysis: Small World Evaluation**

Small World networks are typical of scale-free systems where the average path in the network is short (mean geodesic) and with high transitivity. This means that relative few nodes act as hubs with many relationships and weak links between these hubs. In this case the relative path is shorter than a similar random network, but with no transitivity due to the small number of nodes.

**Analysis: Additional Network
Insight**

Due to the number of
observations in the Anscombe Quartet the following analysis indicates the
potential of complexity sciences to uncover more hidden facts about the
quartet. Extending the network model with Bayesian probabilities, potential
cause-effect relationships can be identified to show potential directional
cause-effect relationships in the data. Understanding this adds greatly to
creating a predictive model using dependent and independent variables.

The
scale-free test enables insight into the self-organised criticality state of
the system (in this case which it is not).

To test robustness of the system, use is made of “random attacks” and “structured
attacks” on the network nodes(vertices). This means the nodes are removed in a random order, as well as a structured order to observe the mean distance of the network. In this
case with the removal of 25% of vertices the mean distance only increased with
1.5% which indicates a fairly robust network. This supports the fact that the
network is fairly dense at 85% - meaning many relationships between the
different indicators and no significant hubs.

**Analysis: Self-organised
clustering**

The time-based
observations (events) are not sufficient for temporal Investigations – hence
self- organised clustering is used to investigate potential correlations
between events. In this case, six of the 5 observations cluster around
significant Business Unit 4 input and outputs.

**INSIGHT DERIVED FROM USING COMPLEXITY SCIENCES**

To summarize the insight derived from the complexity analysis we can see and understand the Anscombe quarter better by adding the following observations:

a) The __volatility indicators__ show similar volatility in the X and Y groups. Volatility explains the deviations from a mean, and in this case the outliers were left alone for demonstration purposes.

b) The __uncertainty measures__ show the different levels of uncertainty on each indicator – it becomes clear that these indicators are not similar as measured by standard statistical measurements.

c) The __ideal capacity__ calculation uses the combination of volatility and uncertainty to indicate the different levels of capacity required by each business unit.

d) The __systemic risk score__ is relatively high at 68%. The level of uncertainty in the indicators, as well as the high system density measurement (85%) supports this level of systemic risk which indicates that the 4 business units do not operate in isolation but have cross relationships which increases the complexity between them.

e) The system’s relationships are not random driven as the benchmark against a similar random network shows significant differences in network characteristics. The strongest relationships are between x1, x2 and x3 at 86%.

f) The system doesn’t measure as a scale-free system and does not support a self-organised critical system.

g) The system is fairly robust against random and direct attacks. A simulated 25% removal of vertices only resulted in a 1.5% increase in average network distance.

h) From a temporal view, 6 out of the 11 events can be clustered around Business unit 4 events – mainly due to the fact that level of uncertainty around Unit 4 is low, and that Unit 4 is not significant impacted by the other units.

This can be summarised in the following risk approach and improvement strategy:

" *This system is fairly integrated and shielded from either random failures or structural failures. This is caused by the high interaction of system components between all business units. However, performance optimization of this system do pose a greater challenge. From the quantitative insight the strategy should be to address and correct the high uncertainty of x1,x2 and x3 performance. The cause-effect model of the system should be used to study the impact of changing x1,x2, and x3 to understand what implications will effect the overall system. Improving x1, x2, and x3 will have significant impact on performance indicators such as customer lead times, reduction in work-in-progress and waiting times. It is important to protect the stability of BU 4 (x4 and y4) as they have significant impact of 60% on all observations made in the system*".

**Anscombe Quartet Scatterplot**

**Conclusion**

In summary the above analysis
shows that the Anscombe Quartet is a fairly complex system of related
relationships between most variables with a high degree of uncertainty in the
data sets. Each business unit, although appearing very similar in standard statistical
measurements are quite different in operation, and cannot be viewed or treated
as individual units.

In conclusion, this approach should still be applied as preached in any data science methodology - use your common sense to analyze, construct and predict !