Saturday, February 26, 2022

W.E.B. Du Bois - data scientist

Changing the world through charts

Two of the key skills of a data scientist are informing and persuading others through data. I'm going to show you how one man, and his team, used novel visualizations to illustrate the lives of African-Americans in the United States at the start of the 20th century. Even though they created their visualizations by hand, these visualizations still have something to teach us over 100 years later. The team's lack of computers freed them to try different forms of data visualizations; sometimes their experimentation was successful, sometimes less so, but they all have something to say and there's a lesson here about communication for today's data scientists.

I'm going to talk about W.E.B. Du Bois and the astounding charts his team created for the 1900 Paris exhibition.

(W.E.B. Du Bois in 1904 and one of his 1900 data visualizations.)

Who was W.E.B. Du Bois?

My summary isn't going to do his amazing life justice so I urge you to read any of these short descriptions of who he was and what he did:

To set the scene here's just a very brief list of some of the things he did. Frankly, summarizing his life in a few lines is ridiculous.

Born 1868, Great Barrington, Massachusetts.
Graduate of Fisk University and Harvard - the first African-American to gain a Ph.D. from Harvard.
Conducted ground-breaking sociological work in Philadelphia, Virginia, Alabama, and Georgia.
His son died in 1899 because no white doctor would treat him and black doctors were unavailable.
Was the primary organizer of "The Exhibit of American Negroes" at the Exposition Universelle held in Paris between April and November 1900.
NAACP director and editor of the NAACP magazine The Crisis.
Debated Lothrop Stoddard, a "scientific racist" in 1929 and thoroughly bested him.
Opposed US involvement in World War I and II.
Life-long peace activist and campaigner, which led to the FBI investigating him in the 1950s as a suspected communist. They withheld his passport for 8 years.
Died in Ghana in 1963.

Visualizing Black America at the start of the twentieth century

In 1897, Du Bois was a history professor at Atlanta University. His former classmate and friend, Thomas Junius Calloway, asked him to produce a study of African-Americans for the 1900 Paris world fair, the "Exposition Universelle". With the help of a large team of Atlanta University students and alumni, Du Bois gathered statistics on African-American life over the years and produced a series of infographics to bring the data to life. Most of the names of the people who worked on the project are unknown, and it's a mystery who originated the form of the plots, but the driving force behind the project was undoubtedly Du Bois. Here are some of my favorite infographics from the Paris exhibition.

The chart below shows where African-Americans lived in Georgia in 1890. There are four categories:

Red - country and villages
Yellow - cities 2,500-5,000
Blue - cities 5,000-10,000
Green - cities over 10,000

the length of the lines is proportional to the population and obviously, the chart shows the huge majority of the population lived in the country and villages. I find the chart striking for three reasons: it doesn't follow any of the modern charting conventions, it clearly represents the data, and it's visually very striking. My criticism is that the design makes it hard to visually quantify the differences, for example, how many more people live in the country and villages compared to cities 5,000-10,000? If I were drawing a chart with the same data today, I might use an area chart to represent the same data; it would quantify things better, but it would be far less visually interesting.

The next infographic is two choropleth charts that show the African-American population of Georgia counties in 1870 and 1880. Remember that the US civil war ended in 1865, and with the Union victory came freedom for the slaves. As you might expect, there was a significant movement of the now-free people. Looking at the charts in detail raises several questions, for example, why did some areas see a growth in the African-American population while other areas did not? Why did the highest populated areas remain the highest populated? The role of any good visualization is to prompt meaningful questions.

This infographic shows the income and expenditure of 150 African-American families in Georgia. The income bands are on the left-hand side, and the bar chart breaks down the families' expenses by category:

Black - rent
Purple - food
Pink - clothes
Dark blue - direct taxes
Light blue - other expenses and savings

There are several notable observations from this chart: the disappearance of rent above a certain income level, the rise in other expenses and savings with rising income, and the declining fraction spent on clothing. There's a lot on this chart and it's worthy of greater study; Du Bois' team crammed a great deal of meaning into a single page. For me, the way the key is configured at the top of the chart doesn't quite work, but I'm willing to give the team a pass on this because it was created in the 19th century. A chart like this wouldn't look out of place in a 2022 report - which of itself is startling.

My final example is a comparison of the occupations of African-Americans and the white population in Georgia. It's a sort-of pie chart, with the upper quadrant showing African Americans and the bottom quadrant showing the white population. Occupations are color-coded:

Red - agriculture, fishery, and mining
Yellow - domestic and personal service
Blue - manufacturing and mechanical industries
Grey - trade and transportation
Brown - professions

The fraction of the population in these employment categories is written on the slices, though it's hard to read because the contrast isn't great. Notably, the order of the occupations is reversed from the top to the bottom quadrant, which has the effect of making the sizes of the slices easier to compare - this can't be an accident. I'm not a fan of pie charts, but I do like this presentation.

Influences on later European movements - or not?

Two things struck me about Du Bois' charts: how modern they looked and how similar they were to later art movements like the Italian Futurists and Bauhaus.

At first glance, his charts look to me like they'd been made in the 1960s. The typography and coloring were obviously pre-computerization, but everything else about them suggests modernity, from the typography to the choice of colors to the layout. The experimentation with form is striking and is another reason why this looks very 1960s to me; perhaps the use of computers to visualize data has constrained us too much. Remember, Du Bois's mission was to explain and convince and he chose his charts and their layout to do so, hence the experimentation with form. It's quite astonishing how far ahead of his time he was.

Italian Futurism started in 1909 and largely fizzled out at the end of the second world war due to its close association with fascism. The movement emphasized the abstract representation of dynamism and technology among other things. Many futurist paintings used a restricted color palette and have obvious similarities with Du Bois' charts, here are just a few examples (below). I couldn't find any reliable articles that examined the links between Du Bois' work and futurism.


Numbers In Love - Giacomo Balla Image from WikiArt	Music - Luigi Russolo Image from WikiArt

The Bauhaus design school (1919-1933) sought to bring modernity and artistry into mass production and had a profound and lasting effect on the design of everyday things, even into the present day. Bauhaus designs tend to be minimal ("less is more") and focus on functionality ("form follows function") but can look a little stark. I searched, but I couldn't find any scholarly study of the links between Du Bois and Bauhaus, however, the fact the Paris exposition charts and the Bauhaus work use a common visual language is striking. Here's just one example, a poster for the Bauhaus school from 1923.

(Joost Schmidt, Public domain, via Wikimedia Commons)

Du Bois' place in data visualization

I've read a number of books on data visualization. Most of them include Nightingale's coxcomb plots and Playfair's bar and pie charts, but none of them included Du Bois charts. Du Bois didn't originate any new chart types, which is maybe why the books ignore him, but his charts are worth studying because of their experimentation with form, their use of color, and most important of all, their ability to communicate meaning clearly. Ultimately, of course, this is the only purpose of data visualization.

Reading more

W. E. B. Du Bois's Data Portraits: Visualizing Black America, Whitney Battle-Baptiste, Britt Rusert. This is the book that brought these superb visualizations to a broader audience. It includes a number of full-color plates showing the infographics in their full glory.

The Library of Congress has many more infographics from the Paris exhibition, it also has photos too. Take a look at it for yourself here https://www.loc.gov/collections/african-american-photographs-1900-paris-exposition/?c=150&sp=1&st=list - but note the charts are towards the end of the list. I took all my charts in this article from the Library of Congress site.

"W.E.B. Du Bois’ Visionary Infographics Come Together for the First Time in Full Color" article in the Smithsonian magazine that reviews the Battle-Baptiste book (above).

"W. E. B. Du Bois' Hand-Drawn Infographics of African-American Life (1900)" article in Public Domain Review that reviews the Battle-Baptiste book (above).

Friday, February 18, 2022

RCT bingo!

A vocabulary of causal inference testing

I was having a clear-out and I came across a printout of some notes I made a while back. It was a list of terms used in causal inference testing. At the time, I used it as a checklist or dictionary to ensure I knew what I was talking about - a kind of RCT bingo if you like.

(Myriam Thomas, CC BY-SA 4.0, via Wikimedia Commons)

I thought I would post it here in case anyone wants to play the same game. Do you know what all these terms mean? Are there key terms I've missed off my list?

ATE - Average Treatment Effect
CATE - Conditional Average Treatment Effect
Counterfactual
DAG - Directed Acyclic Graph
Dynamic Treatment Effect
Epsilon greedy
Estimands
External and internal validity
Heterogeneity (treatment effect heterogeneity)
Homophily
Instrumental Variable (IV)
LATE - Local Average Treatment Effect
Logit model
RCT - Randomized Control Trial
Regret
Salience
Spillover
Stationary effect (and it's opposite non-stationary effect)
Surrogate
SUTVA - Stable Unit Treatment Value Assumption
Thompson sampling
Treatment effect heterogeneity
Wald estimator

Monday, January 17, 2022

Cultural add or fit?

What does cultural fit mean?

At a superficial level, company culture can be presented as free food and drink, table tennis and foosball, and of course company parties. More realistically, it means how people interact with each other, what behavior is encouraged, and crucially what behavior isn't tolerated. At the most fundamental level, it means who gets hired, fired, or promoted.

Cultural fit means how well someone can function within a company or team. At best, it means their personality and the company's way of operating are aligned so the person thrives within the company, performs well, and stays a long time. In this case, everyone's a winner.

For a long time, managers have hired for cultural fit because of the benefits of getting it right.

The unintended consequences

Although cultural fit seems like a good thing to hire for, it has several downsides.

Hiring for cultural fit over the long term means that you can get groupthink. In some situations that's fine, for example, mature or slowly moving industries benefit from a consistent approach over time. But during periods of rapid change, it can be bad because the team doesn't have the diversity of thought to effectively respond to threats; the old ways don't work anymore but the team still fights yesterday's battles.

For poorly performing teams, hiring for cultural fit can mean more of the same, which can be disastrous on two levels: it cements the existing problems and blocks new ways of working.

(Monks in a monastery are a great example of cultural fit. But not everyone wants to join a monastery. Abraham Sobkowski OFM, CC BY-SA 3.0, via Wikimedia Commons)

Cultural add

In contrast to cultural fit that focuses on conformity, cultural add focuses on what new and different things an employee can bring to the team.

Cultural add is not (entirely) about racial diversity; in fact, I would argue it's a serious error to view cultural add solely in racial terms. I've worked with teams composed of individuals from different races, but they all went to the same universities and all had the same middle-class backgrounds. The team looked and sounded diverse but their thinking was strikingly uniform.

Here are some areas of cultural add you might think about:

Someone who followed a non-traditional path to get to where they got. This can mean:

Military experience
Non-university education
They transitioned from one discipline to another (e.g. someone who initially majored in law now working in machine learning).

Single parents. Many young tech teams are full of young single people. A single parent has a radically different set of experiences. They may well bring a much-needed focus on work-life balance.
Older candidates. Their experience in different markets and different companies may be just what you need.
Working-class backgrounds. Most people in tech come from middle-class backgrounds (regardless of country of origin). Someone whose parents were very blue-collar may well offer quite a different perspective.

I'm not saying anything new when I say a good hiring process considers the strengths and weaknesses of a team before the hiring process starts. For example, if a team is weak on communication with others, a desirable feature of a new hire is good communications skills. Cultural add takes this one stage further and actively looks for candidates who bring something new to the table, even when that new thing isn't well-defined.

Square pegs in round holes

The cultural add risk is the same as any hiring risk: you get someone who can't work with the team or who can't perform. Even with cultural add, you still need to recruit someone the team can work with. Cultural add can't be the number one hiring criteria, but it should be a key one.

What all this means in practice

We can boil this down to some don'ts and dos.

Don'ts

Hire people who went to the same small group of universities.
Assume racial diversity = cultural add.
Add people who are exactly the same as the current team.
Rely on employee referrals (people tend to know people who are like them).

Do:

Look for people with non-traditional backgrounds.
Be aware of the hiring channels you use and try and reach out beyond the usual channels.
Look for what new thing or characteristic the candidate brings. This means thinking about the interview questions you ask to find the new thing.
Think about your hiring process and how the process itself filters candidates. If you have a ten-stage process, or a long take-home test, or you do multiple group interviews, this can cause candidates to drop out - maybe even the candidates you most want to attract.

Cultural add goes beyond the hiring process, you have to think about how a person is welcomed. I've seen teams unintentionally (and intentionally) freeze people out because they were a bit different. If you really want to make cultural add work, management has to commit to making it work post-hire.

An old joke

Two men become monks and join a monastery. One of the men is admitted because he's a cultural fit, the other because he's a cultural add.

After dinner one evening, the monks are silent for a while, then one monk says "23" and the other monks smile. After a few minutes, another monk very loudly says "82", and the monks laugh. This goes on for a while to the confusion of the two newcomers. The abbot whispers to them: "We've been together so long, we know each other's jokes, so we've numbered them to save time". The new monks decide to join in.

The cultural fit monk says "82" and there's polite laughter - they've just heard the same joke. The cultural add monk thinks for a second and says "189". There's a pause for a second as the monks look at one another in wonder, then they burst out in side-splitting laughing. Some of the monks are crying with laughter and one looks like he might need oxygen. The laughter goes on for a good ten minutes. The abbot turns to the cultural add monk and says: "they've never heard that one before!".

If you want more of the same, go for cultural fit, if you want something new, go for cultural add.

Friday, January 7, 2022

Prediction, distinction, and interpretation: the three parts of data science

What does data science boil down to?

Data science is a relatively new discipline that means different things to different people (most notably, to different employers). Some organizations focus solely on machine learning, while other lean on interpretation, and yet others get close to data engineering. In my view, all of these are part of the data science role.

I would argue data science generally is about three distinct areas:

Prediction. The ability to accurately extrapolate from existing data sets to make forecasts about future behavior. This is the famous machine learning aspect and includes solutions like recommender systems.
Distinction. The key question here is: "are these numbers different?". This includes the use of statistical techniques to decide if there's a difference or not, for example, specifying an A/B test and explaining its results.
Interpretation. What are the factors that are driving the system? This is obviously related to prediction but has similarities to distinction too.

(A similar view of data science to mine: Calvin.Andrus, CC BY-SA 3.0, via Wikimedia Commons)

I'm going to talk through these areas and list the skills I think a data scientist needs. In my view, to be effective, you need all three areas. The real skill is to understand what type of problem you face and to use the correct approach.

Distinction - are these numbers different?

This is perhaps the oldest area and the one you might disagree with me on. Distinction is firmly in the realm of statistics. It's not just about A/B tests or quasi-experimental tests, it's also about evaluating models too.

Here's what you need to know:

Confidence intervals.
Sample size calculations. This is crucial and often overlooked by experienced data scientists. If your data set is too small, you're going to get junk results so you need to know what too small is. In the real world. increasing the sample size is often not an option and you need to know why.
Hypothesis testing. You should know the difference between a t-test and a z-test and when a z-test is appropriate (hint: sample size).
α, β, and power. Many data scientists have no idea what statistical power is. If you're doing any kind of statistical testing, you need to have a firm grasp of power.
The requirements for running a randomized control trial (RCT). Some experienced data scientists have told me they were analyzing results from an RCT, but their test just wasn't an RCT - they didn't really understand what an RCT was.
Quasi-experimental methods. Sometimes, you just can't run an RCT, but there are other methods you can use including difference-in-difference, instrumental variables, and regression discontinuity. You need to know which method is appropriate and when.
Regression to the mean. This is why you almost always need a control group. I've seen experienced data scientists present results that could almost entirely be explained by regression to the mean. Don't be caught out by one of the fundamentals of statistics.

Prediction - what will happen next?

This is the piece of data science that gets all the attention, so I won't go into too much detail.

Here's what you need to know:

The basics of machine learning models, including:

Generalized linear modeling
Random forests (including knowing why they are often frowned upon)
k-nearest neighbors/k-means clustering
Support Vector Machines
Gradient boosting.

Cross-validation, regularization, and their limitations.
Variable importance and principal component analysis.
Loss functions, including RMSE.
The confusion matrix, accuracy, sensitivity, specificity, precision-recall and ROC curves.

There's one topic that's not on any machine learning course or in any machine learning book that I've ever read, but it's crucially important: knowing when machine learning fails and when to stop a project. Machine learning doesn't work all the time.

Interpretation - what's going on?

The main techniques here are often data visualization. Statistical summaries are great, but they can often mislead. Charts give a fuller picture.

Here are some techniques all data scientists should know:

Heatmaps
Violin plots
Scatter plots and curve fitting
Bar charts
Regression and curve fitting.

They should also know why pie charts in all their forms are bad.

A good knowledge of how charts work is very helpful too (the psychology of visualization).

What about SQL and R and Python...?

You need to be able to manipulate data to do data science, which means SQL, Python, or R. But plenty of people use these languages without being data scientists. In my view, despite their importance, they're table stakes.

Book knowledge vs. street knowledge

People new to data science tend to focus almost exclusively on machine learning (prediction in my terminology) which leaves them very weak on data analysis and data exploration; even worse, their lack of statistical knowledge sometimes leads them to make blunders on sample size and loss functions. No amount of cross-validation, regularization, or computing power will save you from poor modeling choices. Even worse, not knowing statistics can lead people to produce excellent models of regression to the mean.

Practical experience is hugely important; way more important than courses. Obviously, a combination of both is best, which is why PhDs are highly sought after; they've learned from experience and have the theoretical firepower to back up their practical knowledge.

Friday, December 31, 2021

COVID and the base rate fallacy

Should we be concerned that vaccinated people are getting COVID?

I’ve spoken to people who’re worried that the COVID vaccines aren’t effective because some vaccinated people catch COVID and are hospitalized. Let’s look at the claim and see if it stands up to analysis.

Let's start with some facts:

Vaccines aren’t 100% effective for 100% of people. For example, the smallpox vaccine is 95% effective at preventing infection (https://www.health.ny.gov/publications/7022/).

Even if vaccines don’t prevent infection, they can reduce the severity of symptoms (e.g. https://www.cdc.gov/flu/vaccines-work/vaccineeffect.htm).
Vaccines slow or prevent the spread of disease by providing fewer hosts for infection – in other words, herd immunity (https://www.mayoclinic.org/diseases-conditions/coronavirus/in-depth/herd-immunity-and-coronavirus/art-20486808)

Medical tests aren’t 100% effective either; there are false positives and false negatives.

False positives mean the test says positive when really you’re negative.
False negatives mean the test says you’re negative when really you’re positive.

Marc Rummy’s diagram

Marc Rummy created this diagram to explain what’s going on with COVID hospitalizations. He’s made it free to share, which is fantastic.

In this diagram, the majority of the population is vaccinated (91%). The hospitalization rate for the unvaccinated is 50% but for the vaccinated, it’s 10%. If the total population is 110, this leads to 5 unvaccinated people hospitalized and 10 vaccinated people hospitalized - in other words, 2/3 of those in hospital with COVID have been vaccinated.

Explaining the result

Let’s imagine we just looked at hospitalizations: 5 unvaccinated and 10 vaccinated. This makes it look like vaccinations aren’t working – after all, the majority of people in hospital are vaccinated. You can almost hear ignorant journalists writing their headlines now (“Questions were raised about vaccine effectiveness when the health minister revealed the majority of patients hospitalized had been vaccinated.”). But you can also see anti-vaxxers seizing on these numbers to try and make a point about not getting vaccinated.

The reason why the numbers are the way they are is because the great majority of people are vaccinated.

Let’s look at three different scenarios with the same population of 110 people and the same hospitalization rates for vaccinated and unvaccinated:

0% vaccinated – 55 people hospitalized
91% vaccinated – 15 people hospitalized
100% vaccinated – 11 people hospitalized

Clearly, vaccinations reduce the number of hospitalizations. The anti-vaccine argument seems to be, if it doesn't reduce the risk to zero, it doesn't work - which is a strikingly weak and ignorant argument.

In this example, vaccination doesn’t reduce the risk of infection to zero, it reduces it by a factor of 5. In the real world, vaccination reduces the risk of infection by 5x and the risk of death due to COVID by 13x (https://www.nytimes.com/interactive/2021/us/covid-cases.html). The majority of people hospitalized now appear to be unvaccinated even though vaccination rates are only just above 60% in most countries (https://www.nytimes.com/interactive/2021/world/covid-cases.html, https://www.masslive.com/coronavirus/2021/09/breakthrough-covid-cases-in-massachusetts-up-to-about-40-while-unvaccinated-people-dominate-hospitalizations.html).

The bottom line is very simple: if you want to reduce your risk of hospitalization and protect your family and community, get vaccinated.

The base rate fallacy

The mistake the anti-vaxxers and some journalists are making is a very common one, it’s called the base rate fallacy (https://thedecisionlab.com/biases/base-rate-fallacy/). There are lots of definitions online, so I’ll just attempt a summary here: “the base rate fallacy is where someone draws an incorrect conclusion because they didn’t take into account the base rate in the general population. It’s especially a problem for conditional probability problems.”

Let’s use another example from a previous blog post:

“Imagine there's a town of 10,000 people. 1% of the town's population has a disease. Fortunately, there's a very good test for the disease:

If you have the disease, the test will give a positive result 99% of the time (sensitivity).
If you don't have the disease, the test will give a negative result 99% of the time (specificity).

You go into the clinic one day and take the test. You get a positive result. What's the probability you have the disease?”

The answer is 50%.

The reason why the answer is 50% and not 99% is because 99% of the town’s population does not have the disease (the base rate), which means half of the positives will be false positives.

What’s to be done?

Conditional probability (for example, the COVID hospitalization data) is screwy and can sometimes seem counter to common sense. The general level of statistical (and probability) knowledge in the population is poor. This leaves people trying to make sense of the data around them but without the tools to do it, so no wonder they’re confused.

It’s probably time that all schoolchildren are taught some basic statistics. This should include some counter-intuitive results (for example, the disease example above). Even if very few schoolchildren grow up to analyze data, it would be beneficial for society if more people understood that interpreting data can be hard and that sometimes surprising results occur – but that doesn’t make them suspicious or wrong.

More importantly, journalists need to do a much better job of telling the truth and explaining the data instead of chasing cheap clicks.

Monday, November 29, 2021

What's a violin plot and how to make one

What's a violin plot?

Over the last few years, violin plots and their derivatives have become a lot more common; they're a 'sort of' smoothed histogram. In this blog post, I'm going to explain what they are and why you might use them.

To give you an idea of what they look like, here's a violin plot of attendance at English Premier League (EPL) matches during the 2018-2019 season. The width of the plot indicates the relative proportion of matches with that attendance; we can see attendance peaks around 27,000, 55,000, and 75,000, but no matches had zero attendance.

Violin plots get their name because they sort of resemble a violin (you'll have to allow some creative license here).

As we'll see, violin plots avoid the problems of box and whisker plots and the problems of histograms. The cost is greatly increased computation time, but for a modern computer system, violin plots are calculated and plotted in the blink of an eye. Despite their advantages, the computational cost is why these plots have only recently become popular.

Summary statistics - the mean etc.

We can use summary statistics to give a numerical summary of the data. The most obvious statistics are the mean and standard deviation, which are 38,181 and 16,709 respectively for the EPL 2018 attendance data. But the mean can be distorted by outliers and the standard deviation implies a symmetrical distribution. These statistics don't give us any insight into how attendance was distributed.

The median is a better measure of central tendency in the presence of outliers, and quartiles work fine for asymmetric distributions. For this data, the median is 31,948 and the upper and lower quartiles are 25,034 and 53,283. Note the median is a good deal lower than the mean and the upper and lower quartiles are not evenly spaced, suggesting a skewed distribution. The quartiles give us an indication of how skewed the data is.

So we should be just fine with median and quartiles - or are there problems with these numbers too?

Box and whisker plots

Box and whisker plots were introduced by Tukey in the early 1970s and evolved since then, currently, there are several slight variations. Here's a box and whisker plot for the EPL data for four seasons in the most common format; the median, upper and lower quartiles, and the lowest and highest values are all indicated. We have a nice visual representation of our distribution.

The box and whisker plot encodes a lot of data and strongly indicates if the distribution is skewed. Surely this is good enough?

The problem with boxplots

Unfortunately, box and whisker plots can hide important features of the data. This article from Autodesk Research gives a great example of the problems. I've copied one of their key figures here.

("Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing", Justin Matejka, George Fitzmaurice, ACM SIGCHI Conference on Human Factors in Computing Systems, 2017)

The animation shows the problem nicely; different distributions can give the same box and whisker plots. This doesn't matter most of the time, but when it does matter, it can matter a lot, and of course, the problem is hidden.

Histograms and distributions

If boxplots don't work, what about histograms? Surely they won't suffer from the same problem? It's true they don't, but they suffer from another problem; bin count, meaning the number of bins.

Let's look at the EPL data again, this time as a histogram.

Here's the histogram with a bin count of 9.

Here it is with a bin count of 37.

And finally, with a bin count of 80.

Which of these bin counts is better? The answer is, it depends on what you want to show. In some cases, you want a large number of bins, in other cases, a small number of bins. As you can appreciate, this isn't helpful, especially if you're at the start of your career and you don't have a lot of experience to call on. Even later in your career, it's not helpful when you need to move quickly.

An evolution of the standard histogram is using unequal bin sizes. Under some circumstances, this gives a better representation of the distribution, but it adds another layer of complexity; what should the bin sizes be? The answer again is, it depends on what you want to do and your experience.

Can we do better?

Enter the violin plot

The violin plot does away with bin counts by using probability density instead. It's a plot of the probability density function (pdf) that could have given rise to the measured data.

In a handful of cases, we could estimate the pdf using parametric means when the underlying data follows a well-known distribution. Unfortunately, in most real-world cases, we don't know what distribution the data follows, so we have to use a non-parametric method, the most popular being kernel density estimation (kde). This is almost always done using a Gaussian estimator, though other estimators are possible (in my experience, the Gaussian calculation gives the best result anyway, but see this discussion on StackOverflow). The key parameter is the bandwidth, though most kde algorithms attempt to size their bandwidth automatically. From the kde, the probability density is calculated (a trivial calculation).

Here's the violin plot for the EPL 2018 data.

It turns out, violin plots don't have the problems of box plots as you can see from this animation from Autodesk Research. The raw data changes, the box plot doesn't, but the violin plot changes dramatically. This is because all of the data is represented in the violin plot.

Variations on a theme, types of violin plot, ridgeline plots, and mirror density plots

The original form of the violin plot included median and quartile data and you often see violin plots presented like this - a kind of hybrid of violin and box and whisker. This is how the Seaborn plotting package draws Violin plots (though this chart isn't a Seaborn visualization).

Now, let's say we want to compare the EPL attendance data over several years. One way of doing it is to show violin plots next to each other, like this.

Another way is to show two plots back-to-back, like this. This presentation is often called a mirror density plot.

We can show the results from multiple seasons with another form of violin plot called the ridgeline plot. In this form of plot, the charts are deliberately overlapped, letting you compare the shape of the distributions better. They're called ridgeline plots because they sort of look like a mountain ridgeline.

This plot shows the most obvious feature of the data, the 2019-2020 season was very different from the seasons that went before; a substantial number of matches were held with zero attendees.

When should you use violin plots?

If you need to summarize a distribution and represent all of the data, a violin plot is a good choice. Depending on what you're trying to do, it may be a better choice than a histogram and it's always a better choice than box and whisker plots.

Aesthetically, a violin plot or one of its variations is very appealing. You shouldn't underestimate the importance of clear communication and visual appeal. If you think your results are important, surely you should present them in the best way possible - and that may well be a violin plot.

Reading more

"Statistical computing and graphics, violin plots: a box plot-density trace synergism", Jerry Hintze, Ray Nelson, American Statistical Association, 1998. This paper from Hintze and Nelson is one of the earlier papers to describe violin plots and how they're defined, it goes into some detail on the process for creating them.

"Violin plots explained" is a great blog post that provides a nice explanation of how violin plots work.

"Violin Plots 101: Visualizing Distribution and Probability Density" is another good blog post proving some explanation on how to build violin plots.

Monday, November 22, 2021

Mouse jiggling: the good, the bad, and the ugly

What's a mouse jiggler?

Recently, I was reading Hacker News and one of the contributors mentioned a mouse jiggler. I'd never heard of one before, so I searched around. I was both horrified and fascinated by what I discovered.

A mouse jiggler is a device that randomly 'jiggles' your mouse so it appears that you're at your computer. It prevents the screen saver from kicking in and it keeps your status as active in Slack and Zoom. Here's a picture of one type of mouse jiggler.

(Jamsim1975, CC BY-SA 4.0, via Wikimedia Commons)

The good

As it turns out, mouse jigglers are used by law enforcement during raids. One of the agents is tasked with getting to the suspect's computer quickly and setting up the mouse jiggler. The goal is to stop the computer from locking up; if that happens the suspect has to provide their password or a court has to order them to do so. Far better to stop the computer from locking up in the first place.

In the old days, the FBI and other agencies used software mouse jigglers; the mouse motion was set by software installed on a USB stick. Mechanical mouse jigglers are better because they don't rely on the availability of USB ports and they don't rely on security settings on the suspect's computer (not all systems will allow software to be installed via USB).

This blog post has some interesting things to say about mouse jigglers and other software/hardware used during raids.

The bad

There's a reason why security teams have computers lock themselves after a few minutes of user inactivity and the reason is security. Leaving a computer unattended and unlocked is bad, leaving a computer unattended and unlocked with a mouse jiggler over extended periods is even worse. If I were a CISO, I would ban mouse jigglers - or better still, make sure that no one feels the need to use one.

The ugly

For everyone who's not law enforcement, why would you want a jiggler? The sad answer seems to be fooling employee surveillance software. Instead of trusting their employees or measuring by results, some companies have installed surveillance software that tracks mouse usage (mouse use = work). Jigglers are an attempt to circumvent these kinds of trackers.

Jigglers have been around for a while and now there's software to detect them; you too can detect if your employees are jiggling. In response, some of the newer jigglers offer random and subtle jiggles that are harder to detect. I can see a jiggling arms race coming.

The reviews for this jiggler on Amazon are enlightening; there are 2,612 of them, an astonishing number, and the product has a 5-star rating overall. Many of the reviews mention fooling IT surveillance software. If you don't like this one, there are plenty of other models to choose from, many with over 1,000 reviews.

Think about what this says. There are enough people who're concerned about surveillance to spawn a mini-industry for $30 devices. These devices add no value - it's not like a mouse or a keyboard or a camera. As one of the reviewers said, the jiggler lets them go to the bathroom without feeling like it's being noted. It's all about trust, or the lack of it.

If people are using mouse jigglers at your company, it's an indication that something has gone quite wrong.

Wednesday, November 17, 2021

Ice ice baby

Is this the coolest/most Canadian data set ever?

Late last year, I was studying for a machine learning course and I was looking for a data set I could use for the course capstone project. I settled on English Premier League data, but not before I had a serious look at what's probably the coolest (and most Canadian) dataset I've ever seen.

(cogdogblog, CC BY 2.0, via Wikimedia Commons)

The icicle atlas

The University of Toronto Physics Department has a large dataset on icicles created as part of a research project to understand the factors important for icicle formation, size, and shape. Under laboratory conditions, they grew 237 icicles, capturing movies and images of their growth.

As part of the growth process, they varied speed, salinity, and water source, obviously having a little fun with their choice of water sources as you'll see in the atlas. As it turns out, the ridges that sometimes form in icicles are due to impurities in the water (maybe all of us need impurities to be interesting😉).

All the images, movies, and data are covered by a creative commons license. You can see all their icicle data on their icicle atlas page - the related links at the bottom of the page are fantastic.

Icicle pictures

You can view the rogues' gallery of icicle pictures here. Here are some pictures of my favorite icicles from the gallery.

(University of Toronto Physics Department, Icicle Atlas, Creative Commons License.)

Icicle movies

You can see videos of how most of the icicles formed. It's kind of interesting and restful to watch - the opposite of watching a fire burning, but still hypnotic and engaging. Here's a video that I found restful.

(University of Toronto Phyics Department, Icicle Atlas, Creative Commons License.)

If you want more, there's a YouTube channel of the movies from the atlas.

Ice Ice baby

This winter, I'm going to see icicles form on my gutters. I've seen some weird formations over the years, and now I know that temperature, wind, and water purity govern the size and shape of icicles. It's nice to know that we can still take time to understand the processes that form the world around us.

Even though I better understand how icicles form, I know they can damage my gutters and house. Pretty as they are, I'm still going to smash them away.

Monday, November 8, 2021

Football crazy: predicting English Premier League football match results

I can get a qualification and be rich!

A long time ago, I was part of a gambling syndicate. A friend of mine had some software that predicted the results of English football (soccer) matches and at the time, betting companies offered fixed-price odds for certain types of bets. My friend noticed his software predicted 3-2 away wins more often than the betting company's odds would suggest. Over the course of a season, we had a 20% return on our gambling investment.

(Ronnie Macdonald from Chelmsford, United Kingdom, CC BY 2.0, via Wikimedia Commons)

During the COVID lockdown, I took the opportunity to learn R and did a long course that included a capstone project. I decided to see if I could forecast English Premier League (EPL) matches. If I succeeded, I could get a qualification and get rich too! What's not to like? Here's the story of what I did and what happened.

Premier League data

There's an eighteenth-century recipe for a hare dish that supposedly includes the instructions "First, catch your hare." The first step in any project like this is getting your data.

(Robert Scoble from Half Moon Bay, USA, CC BY 2.0, via Wikimedia Commons)

I got match results going back to the start of the league (1993) from football-data. The early data is only match results, but later data includes red cards and some other measurements.

TransferMarkt has data on transfer fees, foreign-born players, and team age, but the data's only available from 2011.

At the time of the project, I couldn't find any other free data sources. There were and are paid-for sources, but they were way beyond what I was willing to pay.

I knew going into the next phase of the project that this wasn't a very big data set with not that many fields. As it turned out, data was a severely limiting factor.

What factors are important?

I had a set of initial hypotheses for factors that might be important for final match scores, here are most of them:

team cost - more expensive teams should win more games
team age - younger teams perform better
prior points - teams with more points win against teams with fewer points
foreign-born players - the more non-English players on the team, the more the team will win
previous match results - successful (winning) teams win more matches
home-field advantage
disciplinary record - red and yellow card history might be an indicator of risk-taking
season effects - as the season wears on, teams take more risks to win matches

I found evidence that most of these did in fact have an impact.

Here's strong evidence of home-field advantage. Note how it goes away during the 2020-2021 season when matches were played without fans.

Here's goal difference vs. team cost difference. The more expensive team tends to win.

Here's goal difference vs. mean prior goal difference. Teams that scored more goals before tend to score more goals in their current match.

I found more relationships you can read about if you're interested.

Machine learning

Thinking back to my gambling syndicate, I decided to forecast the score of each match rather than just win/lose/draw. My loss function was the RMSE of the goal difference between the predicted score and the actual score. To avoid COVID oddities, I removed the 2020-2021 season (the price being a smaller data set). Of course, I used a training and holdout dataset and cross-validation.

The obvious question is, which model machine learning models work? I decided to try a whole bunch of them:

Naive mean score model. A simple model that’s just the mean scores of the (training) data set.
Generalized Linear Model. A form of ordinary linear regression.
Glmnet. Fits lasso and elastic-net regularized generalized linear models.
SVM. Support Vector Machines - boundary-based regression. After some experimentation, I selected the svmRadial form of SVM, which uses a non-linear kernel function.
KNN. K-nearest neighbors. Given that EPL scores are all in close proximity to one another, we might expect this model to return good results.
Neural nets.
XGB Linear. This is linear modeling with extreme gradient boosting. Extreme gradient boosting has gathered a lot of attention over the last few years and may be one of the most used machine learning models today.
XGB Tree. This is a decision tree model with extreme gradient boosting.
Random Forest.

The model results weren't great. For the KNN model, here's how the RMSE for full-time away goals varied with n.

Note the RMSE scale - the lowest it goes to is 1.1 goals and it's plain that adding more n will only take us a little closer to 1.1. Bear in mind, football is a low-scoring game, and being off by 1 goal is a big miss.

It was the same story for random forest.

In fact, it was the same story for all of the models. Here are my final results. My model forecast home goals and away goals.

The naive means model is the simplest and all my sophisticated models could do is give me a few percentage points improvement.

Improving the results

Perhaps the most obvious way forward is combining models to improve RMSE. I'm reluctant to do that until I can get better individual model results. There's a philosophical issue at play; for me, the ensemble approach feels a bit "spray and pray".

In my view, data shortage is the main problem:

My data set was only in the low thousands of matches.
Some teams join the Premier League for just a season and then get relegated - I don't model their history prior to joining the league.
I removed the COVID season of 2020-2021.
I only had team value and disciplinary data for ten or so seasons.
Of course, I only modeled the Premier League.

Football is a low-scoring game, famous for its upsets. It may well be that it's just too random underneath to make useful predictions at the individual match level.

What's next?

I wasn't able to predict EPL results with any great accuracy, but I submitted my report and got my grade. If you want to read my report, you can read it here.

At the end of the 2021 season, I saw some papers published on the COVID effect on match results. I had similar results months before. Perhaps I should have submitted a paper myself.

At some point, I might revive this project if I can get new data. I still occasionally hunt for new data sources, but sadly, I haven't found any. My dreams of retiring to a yacht on the Mediterranean will have to wait.

Monday, November 1, 2021

Why conditional probability screwiness matters for business

Things are not what they seem

Many business decisions come down to common sense or relatively simple math. But applying common sense to conditional probability problems can lead to very wrong results as we'll see. As data science becomes more and more important for business, decisions involving conditional probability will arise more often. In this blog post, I'm going to talk through some counter-intuitive conditional probability examples and where I can, I'll tell you how they arise in a business context.

(These two pieces of track are the same size. Ag2gaeh, CC BY-SA 4.0, via Wikimedia Commons.)

Testing for diseases

This is the problem with the clearest links to business. I'll explain the classical form of the problem and show you how it can come up in a business context.

Imagine there's some disease affecting a small fraction of the population, say 1%. A university develops a great test for the disease:

If you have the disease, the test will give you a positive result 99% of the time.
If you don't have the disease, the test will give you a negative result 99% of the time.

You take the test and it comes back positive. What's the probability you have the disease?

(COVID test kit. Centers for Disease Control and Prevention, Public domain, via Wikimedia Commons)

The answer is 50%.

If you want an explanation of the 50% number, read the section "The math", if you want to know how it comes up in business, skip to the section "How it comes up in business".

The math

What's driving the result is the low prevalence of the disease (1%). 99% of the people who take the test will be uninfected and it's this that pushes down the probability of having the disease if you test positive.

There are at least two ways of analyzing this problem, one is using a tree diagram and one is using Bayes' Theorem. In a previous blog post, I went through the math in detail, so I'll just summarize the simpler explanation using a tree diagram. To make it easier to understand, I'll assume a population of 10,000.

Of the 10,000 people, 100 have the disease, and 9,900 do not. Of the 100, 99 will test positive for the disease. Of the 9,900, 99 will test positive for the disease. In total 99 + 99 will test positive, of which only 99 will have the disease. So 50% of those who test positive will have the disease.

How it comes up in business

Instead of disease tests, let's think of websites and algorithms. Imagine you're the CEO of a web-based business. 1% of the visitors to your website become customers. You want to identify who'll become a customer, so you task your data science team with developing an algorithm based on users' web behavior. You tell them the test is to distinguish customers from non-customers.

They come back with a complex test for customers that's 99% true for existing customers and 99% false for non-customers. Do you have a test that can predict who will become a customer and who won't?

This is the same problem as before, if the test is positive for a user, there's only a 50% chance they'll become a customer.

How many daughters?

This is a classic problem and shows the importance of describing a problem exactly. Exactly, in this case, means using very very precise English.

Here's the problem in its original form from Martin Gardner:

Mr. Jones has two children. The older child is a girl. What is the probability that both children are girls?
Mr. Smith has two children. At least one of them is a boy. What is the probability that both children are boys?

(What's the probability of two girls? Circle of Robert Peake the elder, Public domain, via Wikimedia Commons)

The solution to the first problem is simple. Assuming boys or girls are equally likely, then it's 50%.

The second problem isn't simple and has generated a great deal of debate, even 60 years after Martin Gardner published the puzzle. Depending on how you read the question, the answer is either 50% or 33%. Here's Khovanova's explanation:

"(i) Pick all the families with two children, one of which is a boy. If Mr. Smith is chosen randomly from this list, then the answer is 1/3.

(ii) Pick a random family with two children; suppose the father is Mr. Smith. Then if the family has two boys, Mr. Smith says, “At least one of them is a boy.” If he has two girls, he says, “At least one of them is a girl.” If he has a boy and a girl he flips a coin to say one or another of those two sentences. In this case, the probability that both children are the same sex is 1/2."

In fact, there are several other possible interpretations.

What does this mean for business? Some things that sound simple aren't and differences in the precise way a problem is formulated can give wildly different answers.

Airline seating

Here's the problem stated from an MIT handout:

"There are 100 passengers about to board a plane with 100 seats. Each passenger is assigned a distinct seat on the plane. The first passenger who boards has forgotten his seat number and sits in a randomly selected seat on the plane. Each passenger who boards after him either sits in his or her assigned seat if it is empty or sits in a randomly selected seat from the unoccupied seats. What is the probability that the last passenger to board the plane sits in her assigned seat?"

You can imagine a lot of seat confusion, so it seems natural to assume that the probability of the final passenger sitting in her assigned seat is tiny.

(Ken Iwelumo (GFDL 1.2, GFDL 1.2 or GFDL 1.2), via Wikimedia Commons)

Actually, the probability of her sitting in her assigned seat is 50%.

StackOverflow has a long discussion on the solution to the problem that I won't repeat here.

What does this mean for business? It's yet another example of our intuition letting us down.

The Monty Hall problem

This is the most famous of all conditional probability problems and I've written about it before. Here's the problem as posed by Vos Savant:

"A quiz show host shows a contestant three doors. Behind two of them is a goat and behind one of them is a car. The goal is to win the car.

The host asked the contestant to choose a door, but not open it.

Once the contestant has chosen a door, the host opens one of the other doors and shows the contestant a goat. The contestant now knows that there’s a goat behind that door, but he or she doesn’t know which of the other two doors the car’s behind.

Here’s the key question: the host asks the contestant "do you want to change doors?".

Once the contestant decided whether to switch or not, the host opens the contestant's chosen door and the contestant wins the car or a goat.

Should the contestant change doors when asked by the host? Why?"

(The original uploader was Kuxu at French Wikipedia., CC BY-SA 3.0, via Wikimedia Commons)

Here are the results.

If the contestant sticks with their initial choice, they have a ⅓ chance of winning.
If the contestant changes doors, they have a ⅔ chance of winning.

I go through the math in these two previous blog posts "The Monty Hall Problem" and "Am I diseased? An introduction to Bayes theorem".

Once again, this shows how counter-intuitive probability questions can be.

What should your takeaway be, what can you do?

Probability is a complex area and common sense can lead you wildly astray. Even problems that sound simple can be very hard. Things are made worse by ambiguity; what seems a reasonable problem description in English might actually be open to several possible interpretations which give very different answers.

(Sound judgment is needed when dealing with probability. You need to think like a judge, but you don't have to dress like one. InfoGibraltar, CC BY 2.0, via Wikimedia Commons)

If you do have a background in probability theory, it doesn't hurt to remind yourself occasionally of its weirder aspects. Recreational puzzles like the daughters' problem are a good refresher.

If you don't have a background in probability theory, you need to realize you're liable to make errors of judgment with potentially serious business consequences. It's important to listen to technical advice. If you don't understand the advice, you have three choices: get other advisors, get someone who can translate, or hand the decision to someone who does understand.

Sunday, October 3, 2021

Battle leadership: some lessons for managers from World War I

Battle leadership

Books on military leadership and management have been popular in the business world for a long time. "The art of war" is a best-seller 2,500 or so years after it was written and books authored by US military leaders have consistently sold well. To state the obvious, business is not war and companies are not armies, but given this, there are still lessons military leaders have to offer; the art is picking out what applies and what doesn't.

I recently stumbled across an old military leadership book dating back to World War I. Although the world has changed greatly since then, I found some of the ideas and discussions still relevant to today's business environments. Read on to find out more.

(The book and a German soldier in World War I. Internet Archive Book Images, No restrictions, via Wikimedia Commons.)

The book and its history

The book is "Battlefield Leadership" and was first published in 1933 in the US. The author was Adolph von Schell (1893-1967), an officer in the German army in the First World War. He led troops in the European theater during the war, winning a number of medals and commendations [https://de.wikipedia.org/wiki/Adolf_von_Schell_(General,_1893)]. After the war, he trained at the US Army's Fort Benning where he was asked to speak about his experiences leading men in combat. His talks became the book, "Battlefield Leadership" which was published in English (strangely, the German translation was published much later).

The book is an odd mix of psychology, management, and battlefield stories, not all of which are relevant to the world of business. Plainly, warfare has moved on a lot since the First World War; tactics and strategy have changed greatly, but what hasn't changed are some of the core ideas of people management, as we shall see.

Core ideas

Battlefield psychology

Schell states that in "modern" warfare, people fight in small groups, often as individuals, against an enemy they can't see. Therefore, commanders need to know how individuals are likely to react and how they can be influenced. A unit may well have soldiers from diverse backgrounds, so the commander has to have an appreciation of their culture. Similar lessons apply further up the command hierarchy; a general must know his subordinates and how to motivate them: "Furthermore, each one reacts differently at different times, and must be handled each time according to his particular reaction", Schell gives a crude example in the book, but despite the crudity, the underlying lesson is clear.

He has some valid points to make about the need for individuals to exert some measure of control over their situation. Soldiers that wait under hostile fire have time to think and become stressed because they can't change their situation; they lie waiting for bullets to hit them. Soldiers on patrol are more at risk, but their destiny is in their hands so they're willing to go out and take control of their situation. He talks about soldiers under fire moving their position to be more secure: "it makes no difference whether or not the security is real; it is simply a question of feeling that it is".

Men under fire need some measure of security. Schell gives examples of a commander who ordered his men to have haircuts while their position was being shelled. The point isn't the haircuts, the point is the sense of normality the haircutting process gave. Even though men died during the shelling, morale stayed high because the team had a sense of security.

Experience matters in many ways

In several places throughout the book, Schell gives examples of how experienced troops behave compared to troops that had not seen combat. He gives an example from the early days of the war when his company of inexperienced troops first crossed into Belgium; despite meeting no resistance, they shot at shadows in the forest and spent a restless first night afraid of an attack that never came. By contrast, later in the war, he led battle-hardened troops in Russia. Despite similar circumstances, they didn't shoot at shadows in the forest, instead, they posted two sentries and the rest of the company slept soundly, even though they were in enemy territory.

As a practical matter, he recommends mixing inexperienced and battle-hardened troops. He comments that even on a day-to-day basis, and away from battle, seasoned troops coach the inexperienced troops on what to do and how to behave. He similarly cautions against changing commanders, a commander has to get to know his troops, and wartime is not an ideal time to do it:
"If we give these inexperienced troops a backbone of experienced soldiers and experienced commanders their efficiency will be tremendously increased and they will be spared heavy losses."

False data and preparation

I'm going to quote two of his lessons verbatim:
"(1) At the commencement of war, soldiers of all grades are subject to a terrific nervous strain. Dangers are seen on every hand. Imagination runs riot. Therefore, teach your soldiers in peace what they may expect in war, for an event foreseen and prepared for will have little if any harmful effect.
(2) As leaders be careful both in sending and in receiving reports. At the commencement of a war, ninety percent of all reports are false or exaggerated."

Change the word "war" for "competitive situation" and you get something obviously relevant for business.

Orders based on incomplete information

Quite correctly, Schell points out that leaders have to make decisions based on partial information, and on information which is doubtful at best:
"In open warfare a leader will have to give his orders without having complete information. At times only his own will is clear. If he waits for complete information before acting he will never make a decision."

Orders, improvisation, and maps

This is my favorite part of the book. It recalls an action where the Germans and Russians faced off against one another in Russian territory. The German commander received continual updates on the Russians' position and he changed his tactics in response to the new information. His commands to his men were clear, simple, and to the point. Here's his summary:
"This example shows clearly that difficult situations can be solved only by simple decisions and simple orders. The more difficult the situation the less time there will be to issue a long order, and the less time your men will have to understand it. Moreover, the men will be high-strung and tense. Only the simplest order can be executed under such conditions."

In the same action, the Germans were facing a larger Russian opponent. They needed to watch a Russian position but didn't have the troops to. A corporal solved the problem. There was a large herd of cows in a nearby village, so the Germans moved the cows to a field between them and the Russians. Whenever the Russians tried to advance across the field, they disturbed the cows, so alerting the Germans.

(Cows are free watchdogs. Jonas Eppler, CC BY-SA 4.0, via Wikimedia Commons)

Eventually, of course, superior numbers prevailed and the Germans had to retreat. The commander knew where to retreat to, but didn't have a map. Waiting for a map would have meant defeat, so based only on a rough knowledge of geography, they retreated, eventually joining up with other companies. The commander had to take a risk by retreating into unknown territory, but not retreating would have been more dangerous.

Mission orders

Notably, Schell discusses 'mission tactics' which are better known as 'mission orders' today. The idea is simple, commanders will achieve more if they can exert some control, so orders should focus on the mission and not on how it's to be executed. There's a sound operational reason too: "This is done because the commander on the ground is the only one who can correctly judge existing conditions and take the proper action if a change occurs in the situation". The relations to modern business are obvious: give senior people goals to achieve and give them the freedom to do it in whatever way they can.

Some miscellaneous quotes

I found little nuggets of wisdom throughout the book, here are some I want to share:

"To leave the bulk of the artillery behind may strike the reader as dangerous but I believe the decision to do so was correct. The Germans were pursuing and almost anything can be dared where opposed to a beaten opponent. Everything had to be sacrificed to speed if the Russians were to be overtaken. In this situation legs were the important thing, not cannon."
"The importance of surprise in war cannot be overestimated. As it becomes increasingly difficult to obtain so does it become increasingly effective when it is obtained. No effort should be spared to make the decisive element of surprise work for us in war."
"There is only one opportunity to issue detailed orders and that is before battle. When the action has actually begun, orders must be short and simple."
"Every fight develops differently than is expected. Officers and troops must realize this in peace, in order that they will not lose courage when the unexpected occurs in war."

There's very little new in management

This book was first published in 1933 based on Schell's experience in war over the period 1915-1918. There's more in this short book than I've seen in some much longer and more recent management books, and frankly, there's more of substance in this book than some highly-paid consultants I've worked with. Is this the only management book you should read? Absolutely not. Does it contain some interesting insights? Yes it does.

The book was published in 1933 and the author died in 1967. It's not clear to me what the copyright situation is. You can buy a copy cheaply on Amazon, but you can also find free PDFs available online from legitimate sources.