In my previous blog  I speculated about the practical
application of complexity sciences to assist in business management. In this
blog I demonstrate how we use complexity sciences to create additional
quantitative insight into business systems to support our mission at IAM of “To see
better. To understand better. To do better”. As an entrepreneur, I have learned
over the years that investments go hand-in-hand with risk – understand risk in order to mitigate potential issues and
subsequently gain on system performance.
The Anscombe quartet consists of four data sets which share similar statistical properties, but when
visually inspected shows different patterns. I provide the visual
scatter plot at the end of the blog to support the use of complexity science in discovering better insight into system behavior.
The Anscombe quartet
The Anscombe quartet
is 4 data sets which shares common statistical attributes. In this example the
four pairs can represent 4 business units, departments, products or processes
with 11 observations.
Descriptive statistics show
similarity between the X and Y pairs, with similar correlations between the
variables. On face value these 4 units operate and perform in a similar fashion.
What else can we see or understand from this system ? How will we go about to optimize this system ?  What is the overall risk of this system ? If we optimise this system, where should we start ? What risks do we face if we do this ?
With complexity science we can get quantitative answers on these questions. To summarize the following sections we can state the following about the system as derived from quantitative analysis:
" This system is fairly integrated and shielded from either random failures or structural failures. This is caused by the high interaction of system components between all business units. However, performance optimization of this system do pose a greater challenge. From the quantitative insight the strategy should be to address and correct the high uncertainty of x1,x2 and x3 performance. The cause-effect model of the system should be used to study the impact of changing x1,x2, and x3 to understand what implications will effect the overall system. Improving x1, x2, and x3 will have significant impact on performance indicators by reducing customer lead times, work-in-progress and waiting times. It is important to protect the stability of BU 4 (x4 and y4) as they have significant impact on 60% of all observations".
ANALYSIS: VOLATILITY, UNCERTAINTY & IDEAL CAPACITY
Volatility = measure deviation from the indicator mean.  Between 0-50 % indicates a relative
stable indicator, between 51-100 less stable, and above 100 highly volatile.
Uncertainty = indicates the level of uncertainty in the indicator measurement, with 0%
no uncertainty, and 100% total uncertainty in the measurement.
Ideal Capacity =  ideal capacity required to process indicator values according to the level of  uncertainty in the indicator values at the selected frequency of observations.
ANAlysis: Systemic Risk Score
Complexity Score = Relative value calculated based on the current measured complexity within range of the minimum and maximum complexity of the system.
Systemic Risk Score = Value between 0 and 100 %.
Analysis: Small World Evaluation
Small World networks are typical of scale-free systems where the average path in the network is short (mean geodesic) and with high transitivity. This means that relative few nodes act as hubs with many relationships and weak links between these hubs. In this case the relative path is shorter than a similar random network, but with no transitivity due to the small number of nodes.
Analysis: Additional Network
Insight
Due to the number of
observations in the Anscombe Quartet the following analysis indicates the
potential of complexity sciences to uncover more hidden facts about the
quartet. Extending the network model with Bayesian probabilities, potential
cause-effect relationships can be identified to show potential directional
cause-effect relationships in the data. Understanding this adds greatly to
creating a predictive model using dependent and independent variables. 
The
scale-free test enables insight into the self-organised criticality state of
the system (in this case which it is not).
To test robustness of the system, use is made of “random attacks” and “structured
 attacks” on the network nodes(vertices). This means the nodes are removed in a random order, as well as a structured order to observe the mean distance of the network. In this
case with the removal of 25% of vertices the mean distance only increased with
1.5% which indicates a fairly robust network. This supports the fact that the
network is fairly dense at 85% - meaning many relationships between the
different indicators and no significant hubs.
Analysis: Self-organised
clustering
The time-based
observations (events) are not sufficient for temporal Investigations – hence
self- organised clustering is used to investigate potential correlations
between events. In this case, six of the 5 observations cluster around
significant Business Unit 4 input and outputs.
INSIGHT DERIVED FROM USING COMPLEXITY SCIENCES
To summarize the insight derived from the complexity analysis we can see and understand the Anscombe quarter better by adding the following observations:
a)    The volatility indicators show similar volatility in the X and Y groups. Volatility explains the deviations from a mean, and in this case the outliers were left alone for demonstration purposes.
b)  The uncertainty measures show the different levels of uncertainty on each indicator – it becomes clear that these indicators are not similar as measured by standard statistical measurements.
c)   The ideal capacity calculation uses the combination of volatility and uncertainty to indicate the different levels of capacity required by each business unit.
d)   The systemic risk score is relatively high at 68%. The level of uncertainty in the indicators, as well as the high system density measurement (85%) supports this level of systemic risk which indicates that the 4 business units do not operate in isolation but have cross relationships which increases the complexity between them.
e)   The system’s relationships are not random driven as the benchmark against a similar random network shows significant differences in network characteristics. The strongest relationships are between x1, x2 and x3 at 86%.
f)  The system doesn’t measure as a scale-free system and does not support a self-organised critical system.
g)   The system is fairly robust against random and direct attacks. A simulated 25% removal of vertices only resulted in a 1.5% increase in average network distance.
h)  From a temporal view, 6 out of the 11 events can be clustered around Business unit 4 events – mainly due to the fact that level of uncertainty around Unit 4 is low, and that Unit 4 is not significant impacted by the other units.
This can be summarised in the following risk approach and improvement strategy:
"  This system is fairly integrated and shielded from either random failures or structural failures. This is caused by the high interaction of system components between all business units. However, performance optimization of this system do pose a greater challenge. From the quantitative insight the strategy should be to address and correct the high uncertainty of x1,x2 and x3 performance. The cause-effect model of the system should be used to study the impact of changing x1,x2, and x3 to understand what implications will effect the overall system. Improving x1, x2, and x3 will have significant impact on performance indicators such as customer lead times, reduction in work-in-progress and waiting times. It is important to protect the stability of BU 4 (x4 and y4) as they have significant impact of 60% on all observations made in the system".
Anscombe Quartet Scatterplot
Conclusion
In summary the above analysis
shows that the Anscombe Quartet is a fairly complex system of related
relationships between most variables with a high degree of uncertainty in the
data sets. Each business unit, although appearing very similar in standard statistical
measurements are quite different in operation, and cannot be viewed or treated
as individual units. 
In conclusion, this approach should still be applied as preached in any data science methodology - use your common sense to analyze, construct and predict !