Wednesday, May 14, 2025

You need to use Manus

What is Manus - agentic AI

Manus is an AI agent capable of performing a number of high-level tasks that previously could only be done by humans. For example, it can research an area (e.g. a machine learning method) and produce an intelligible report, it can even turn a report into an interactive website. You can get started on it for free.

It created a huge fuss on its release, and rightly so. The capabilities it offers are ground-breaking. We're now a few months later and it's got even better.

In this blog post, I'm going to provide you with some definitions, show you what Manus can do, give you some warnings, and provide you with some next steps.

If you want to get an invitation to Manus, contact me.

How it works 

We need some definitions here. 

An LLM (Large Language Model) is a huge computer model that's been trained on large bodies of text. That could be human language (e.g. English, Chinese) or it could be computer code (e.g. Python, JavaScript). An LLM can do things like:

  • extract meaning from text e.g. given a news article on a football match, it can tell you the score, who won, who lost, and other details from the text
  • predict the next word in a sentence or the next sentence in a paragraph
  • produce entire "works", for example, you can ask an LLM to write a play on a given theme.

A agent is an LLM that controls other LLMs without human intervention. For example, you might set it the task of building a user interface using react.js. The agent will interpret your task and break it down to several sub tasks. It will then ask LLMs to build code for each sub task and stitch the code together.  More importantly for this blog post, you can use an agent to build a report for you on a topic. The agent will break down your request into chunks, assign those chunks to LLMs, and build an answer for you. An example topic might be "build me a report on what to do during a 10 day vacation in Brazil".

Manus is an agentic AI. It will split your request into chunks, assign those chunks to LLMs (it could be the same LLM or it could be different ones depending on the task), and combine the results into a report.

An example

I gave the following instructions to Manus:

You are an experienced technical professional. You will write a report explaining how logistic regression works for your colleagues. Your report will be a Word document. Your report will include the following sections:

* Why logistic regression is important.

* The theory and math behind it.

* A worked example. This will include code in Python using the appropriate libraries.

You will include the various math formula using the correct notation. You will provide references where appropriate.

Here's how it got started:


After it started, I realized I needed to modify my instructions, here's the dialog:

It incorporated my request and did add more sections.

Here's an example of how it kept me updated:

After 20 minutes, it produced a report in Word format. After reading the report, I realized I wanted to turn it into a blog post, so I asked Manus to give me the report as a HTML document, which it did. 

I've posted the report as a blog post and you can read it here: https://blog.engora.com/2025/05/the-importance-of-logistic-regression.html

A critique of the Manus report

I'm familiar with logistic regression so I can critique what Manus returned. I'd give it a B+. This may sound a bit harsh, but that's a very credible result for 20 minutes of effort. It's enough to get going with but it's not enough of itself. Here's my assessment.

  • Writing style and use of English. Great. Better than most native English speakers.
  • Report organization. Great. Very clear and concise. Nicely formatted.
  • Technically correctness. I couldn't spot anything wrong with what it produced. It did miss important stuff out though and did have some oddities:
    • Logistic regression with more than two target variables, no mention of it.
    • Odds ratio can vary from from 0 to +\(\infty\) but it didn't mention it. This is curious as it pointed out that linear regression can vary from -\(\infty\) to +\(\infty\) in the prior paragraphs.
    • Too terse description of the sigmoid function. It should have included a chart and it should have had a deeper discussion of some of the relevant properties of the function.
    • No meaningful discussion of decision boundaries (one mention in not enough detail).
  • Formula. A curious mixed bag. In some cases, it gave very good formula using the standard symbols and in other cases it gave code-like formula. This might be because I told it I wanted a Word report. By default, it uses markdown and it may be better to keep the report in markdown. It might be worth experimenting telling it use Latex for formula.
  • Code. Great.
  • References. Not great. No links back to the several online books that talk about logistic regression in some detail. No links to academic papers. The references it did provide were kind of OK, but really not enough and overall, not high quality enough.

To fix some of these issues, I could have tweaked my prompt, for example, telling it to use academic references, or giving it instructions to expand certain areas etc. This would cost more tokens. I could have told it to use high-effort reasoning which would also have cost me more tokens. 

Tokens in AI

Computation isn't free and that's especially true of AI. Manus, in common with many other AI services, uses a "token" model. This report cost me 511 tokens. Manus gives you a certain number of tokens for free, which is enough for experimentation but not enough for commercial use.

What's been written about it

Other people have written about Manus too. Here are some reviews:

Who owns Manus

Manus is owned by a Chinese company called Monica (also known as Butterfly Effect AI) based in Wuhan.

Some cautions

As with any LLM or agentic AI, I suggest that you do not share company confidential information or PII. This includes data, but also includes text. Some LLMs/agents will use any data (including text) you supply to help train their models. This might be OK, but it also might not be OK - proceed with caution.

Before you use any agentic AI or an LLM for "production" use, I suggest a legal and risk review.

  • What does their system do with the data you send it? Does it retain the data, does it train the model? Is it resold?
  • What does their system do with the output (e.g. final report, generated code)? 
  • Can you ask for your data to be removed from their model or system?

What this means - next steps

These types of agentic AI are game-changers. They will get you information you need far faster and far cheaper than a human could do it. The information isn't perfect and perhaps you wouldn't give it an A, but it's more than good enough to get started and frankly, most humans don't produce A work.

If you're involved in any kind of knowledge work, you should be experimenting with Manus and its competitors. This technology has obvious implications for employment and if you think you might be affected, it behoves you to understand what's going on.

If you want to get started, reach out to me to get an invitation to Manus and get extra free tokens.

The Importance of Logistic Regression

Note

With the exception of this note, everything else on this blog post was automatically created by Manus. I'm providing it as an example of what you can create.

In this separate blog post, I explain how I created this report and I provide an evaluation of it.

If you wanted to get started with Manus, contact me and I'll share an invitation with you.

Mike

======================================

The Importance of Logistic Regression

Logistic regression stands as a cornerstone in the field of machine learning and statistics, primarily recognized for its efficacy in tackling binary classification problems. Its importance stems from a combination of its interpretability, efficiency, and the foundational understanding it provides for more complex algorithms. Unlike linear regression, which predicts continuous outcomes, logistic regression is specifically designed to predict the probability of an instance belonging to a particular class, typically one of two (e.g., yes/no, true/false, 0/1). This probabilistic output is crucial in many real-world scenarios where a clear-cut decision boundary is needed, but an understanding of the likelihood of each outcome is also valuable.

One of the key reasons for logistic regression’s widespread adoption is its relative simplicity and ease of implementation. It serves as an excellent starting point for individuals venturing into predictive modeling and classification tasks. The mathematical underpinnings, while involving concepts like the sigmoid function and log-odds, are generally more accessible than those of more sophisticated models like neural networks or support vector machines. This accessibility does not, however, detract from its power. Logistic regression can provide robust and accurate predictions, especially when the relationship between the independent variables and the log-odds of the dependent variable is approximately linear.

Furthermore, the interpretability of logistic regression models is a significant advantage. The coefficients derived from a trained logistic regression model can be directly interpreted in terms of the odds ratio. This allows practitioners to understand the influence of each independent variable on the likelihood of the outcome. For instance, in a medical diagnosis scenario, a logistic regression model can not only predict the probability of a patient having a certain disease but also quantify how factors like age, weight, or specific test results contribute to that probability. This level of insight is invaluable in fields where understanding the ‘why’ behind a prediction is as important as the prediction itself.

Logistic regression is also computationally efficient, making it suitable for large datasets and real-time applications. Training a logistic regression model is generally faster compared to more complex algorithms, and making predictions is also quick. This efficiency, combined with its good performance on many binary classification tasks, makes it a go-to algorithm for a wide range of applications. These applications span various domains, including medical diagnosis (e.g., predicting disease presence), finance (e.g., credit scoring, fraud detection), marketing (e.g., predicting customer churn or purchase likelihood), and social sciences (e.g., predicting voting behavior).

Moreover, logistic regression serves as a fundamental building block for understanding more advanced classification techniques. Many concepts introduced in logistic regression, such as the use of a link function (the sigmoid function), maximum likelihood estimation for parameter fitting, and the evaluation of model performance using metrics like accuracy, precision, recall, and AUC-ROC, are transferable to other machine learning algorithms. Therefore, a solid grasp of logistic regression provides a strong foundation for learning and applying more complex models.

In summary, the importance of logistic regression is multifaceted. It is a powerful yet relatively simple and interpretable classification algorithm that provides probabilistic outputs. Its computational efficiency, wide range of applications, and its role as a foundational concept in machine learning solidify its place as an essential tool in the data scientist’s and statistician’s toolkit. Whether used as a standalone model or as a baseline for comparison with more complex methods, logistic regression continues to be a highly relevant and valuable technique in the world of data analysis and predictive modeling.

The Theory and Math Behind Logistic Regression

Logistic regression, despite its name, is a statistical model used for binary classification tasks, meaning it predicts the probability of an instance belonging to one of two classes. The core idea is to model the probability that a given input point belongs to a certain class. To understand its mechanics, we need to delve into concepts like the odds, the logit function, the sigmoid (or logistic) function, and the method of maximum likelihood estimation for fitting the model.

From Linear Regression to Probabilities

Linear regression predicts a continuous output, y, based on a linear combination of input features, X. The equation for a simple linear regression with one feature is y = β₀ + β₁x. For multiple features, this becomes y = β₀ + β₁x₁ + β₂x₂ + … + βₚxₚ. However, the output of linear regression can range from -∞ to +∞, which is not suitable for probabilities that must lie between 0 and 1.

To address this, logistic regression transforms the linear combination of inputs using a function that maps any real-valued number into the (0, 1) interval. This function is the sigmoid function, also known as the logistic function.

The Sigmoid (Logistic) Function

The sigmoid function is defined as:

σ(z) = 1 / (1 + e^(-z))

Here, ‘z’ represents the linear combination of input features and their corresponding coefficients (weights): z = β₀ + β₁x₁ + β₂x₂ + … + βₚxₚ. The output of the sigmoid function, σ(z), is the estimated probability P(Y=1|X), i.e., the probability that the dependent variable Y is 1 (e.g., ‘pass’, ‘yes’, ‘disease present’) given the input features X. As z approaches +∞, e^(-z) approaches 0, and σ(z) approaches 1. Conversely, as z approaches -∞, e^(-z) approaches +∞, and σ(z) approaches 0. This S-shaped curve is ideal for modeling probabilities.

Odds and Log-Odds (Logit)

To understand the derivation of the logistic regression model, it’s helpful to consider the concept of odds. The odds of an event occurring is the ratio of the probability of the event occurring to the probability of it not occurring:

Odds = P(Y=1|X) / P(Y=0|X)

Since P(Y=0|X) = 1 - P(Y=1|X), we can write:

Odds = P(Y=1|X) / (1 - P(Y=1|X))

If we let p(X) = P(Y=1|X) = σ(z) = 1 / (1 + e^(-z)), then:

1 - p(X) = 1 - [1 / (1 + e^(-z))] = (1 + e^(-z) - 1) / (1 + e^(-z)) = e^(-z) / (1 + e^(-z))

So, the odds become:

Odds = [1 / (1 + e^(-z))] / [e^(-z) / (1 + e^(-z))] = 1 / e^(-z) = e^z

Now, taking the natural logarithm of the odds gives us the log-odds, also known as the logit function:

logit(p(X)) = ln(Odds) = ln(e^z) = z

Thus, we have:

ln(p(X) / (1 - p(X))) = β₀ + β₁x₁ + β₂x₂ + … + βₚxₚ

This equation shows that the log-odds of the outcome is a linear function of the input features. This is the fundamental relationship that logistic regression models. The coefficients (β) can be interpreted in terms of the change in log-odds for a one-unit change in the corresponding feature, holding other features constant. Exponentiating a coefficient gives the odds ratio.

Model Fitting: Maximum Likelihood Estimation (MLE)

Unlike linear regression, where coefficients are typically estimated using Ordinary Least Squares (OLS), logistic regression coefficients are estimated using Maximum Likelihood Estimation (MLE). MLE is a method for estimating the parameters of a statistical model by finding the parameter values that maximize the likelihood of observing the given data.

For a dataset with ‘n’ independent observations {(xᵢ, yᵢ)}, where xᵢ is the vector of features for the i-th observation and yᵢ is its binary outcome (0 or 1), the likelihood function L(β) is the product of the probabilities of observing each yᵢ given xᵢ and the parameters β:

L(β) = Πᵢ [p(xᵢ) ^ yᵢ] * [(1 - p(xᵢ)) ^ (1 - yᵢ)]

where p(xᵢ) = σ(β₀ + β₁x₁ᵢ + … + βₚxₚᵢ) is the predicted probability for the i-th observation.

It is often easier to work with the log-likelihood function, ll(β), because it converts the product into a sum:

ll(β) = ln(L(β)) = Σᵢ [yᵢ * ln(p(xᵢ)) + (1 - yᵢ) * ln(1 - p(xᵢ))]

Substituting p(xᵢ) = 1 / (1 + e^(-zᵢ)) and 1 - p(xᵢ) = e^(-zᵢ) / (1 + e^(-zᵢ)), where zᵢ = β₀ + β₁x₁ᵢ + … + βₚxₚᵢ, the log-likelihood becomes:

ll(β) = Σᵢ [yᵢ * zᵢ - ln(1 + e^(zᵢ))]

To find the values of β that maximize this log-likelihood function, we typically use iterative optimization algorithms like Gradient Ascent (since we are maximizing) or Newton-Raphson. These algorithms start with initial estimates for β and iteratively update them until the log-likelihood converges to a maximum. There is no closed-form solution for the β coefficients in logistic regression, unlike in linear regression.

Assumptions of Logistic Regression

While logistic regression is more flexible than linear regression, it still relies on a few key assumptions:

  1. Binary Dependent Variable: The dependent variable must be binary or dichotomous (e.g., 0/1, yes/no). For more than two categories, extensions like multinomial or ordinal logistic regression are used.
  2. Independence of Observations: The observations should be independent of each other. This is a common assumption for many statistical models.
  3. Linearity of Log-Odds: The relationship between the independent variables and the log-odds of the outcome is assumed to be linear. This can be checked using techniques like the Box-Tidwell test or by plotting residuals.
  4. Absence of Multicollinearity: There should be little or no multicollinearity among the independent variables. High multicollinearity can make it difficult to estimate the individual effects of the predictors.
  5. Large Sample Size: Logistic regression typically requires a reasonably large sample size to achieve stable and reliable estimates of the coefficients.

Understanding these theoretical and mathematical underpinnings is crucial for effectively applying logistic regression, interpreting its results, and diagnosing potential issues.

Worked Example: Logistic Regression in Python

This section provides a practical, step-by-step demonstration of how to implement logistic regression using Python. We will leverage popular libraries such as pandas for data manipulation, scikit-learn for machine learning tasks including model building and evaluation, and numpy for numerical operations. For this example, we will use the well-known Breast Cancer Wisconsin (Diagnostic) dataset, which is conveniently available within scikit-learn. This dataset presents a binary classification problem: predicting whether a breast mass is malignant or benign based on several computed features from digitized images of fine needle aspirates (FNA).

1. Importing Necessary Libraries

The first step in any Python-based data science task is to import the required libraries. We will need pandas for creating and managing DataFrames, numpy for numerical computations (though its direct use might be minimal here, it underpins scikit-learn), and several modules from scikit-learn for data splitting, model implementation, preprocessing, and metrics.

# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import load_breast_cancer # Using a built-in dataset for simplicity

2. Loading and Exploring the Dataset

We load the breast cancer dataset using load_breast_cancer() from sklearn.datasets. The data and feature names are then used to create a pandas DataFrame for easier manipulation and inspection. The target variable, indicating whether a tumor is malignant (1) or benign (0), is added as a new column to this DataFrame.

# Load the dataset
cancer = load_breast_cancer()
df = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)
df["target"] = cancer.target

Before proceeding with modeling, it is crucial to perform some initial exploratory data analysis (EDA). We display the first few rows of the DataFrame using df.head() to get a feel for the data, df.info() to understand the data types and check for missing values, and df["target"].value_counts() to see the distribution of the target classes.

print("--- Dataset Head ---")
print(df.head())
print("\n--- Dataset Info ---")
df.info()
print("\n--- Target Value Counts ---")
print(df["target"].value_counts())

This initial exploration helps confirm that the dataset is loaded correctly, identify the nature of the features (all appear to be numerical in this case), and understand the balance of the classes in the target variable, which is important for classification tasks.

3. Defining Features and Target Variable

Next, we separate the dataset into features (independent variables, denoted as X) and the target variable (dependent variable, denoted as y). X will contain all columns except the ‘target’ column, and y will consist solely of the ‘target’ column.

# Define features (X) and target (y)
X = df.drop("target", axis=1)
y = df["target"]

4. Splitting Data into Training and Testing Sets

To evaluate the performance of our logistic regression model on unseen data, we split the dataset into a training set and a testing set. The model will be trained on the training set, and its predictive performance will be assessed on the testing set. We use train_test_split from sklearn.model_selection for this purpose. A common split is 80% for training and 20% for testing. Setting random_state ensures that the split is the same every time the code is run, making the results reproducible. The stratify=y argument ensures that the proportion of the target classes is maintained in both the training and testing sets, which is particularly important for imbalanced datasets.

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print(f"\n--- Shape of Training Data ---")
print(f"X_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"--- Shape of Testing Data ---")
print(f"X_test shape: {X_test.shape}")
print(f"y_test shape: {y_test.shape}")

5. Feature Scaling

Many machine learning algorithms, including logistic regression (especially when using certain solvers like ‘lbfgs’ or when regularization is applied), perform better when the input numerical features are on a similar scale. Feature scaling standardizes the range of independent variables. We use StandardScaler from sklearn.preprocessing, which standardizes features by removing the mean and scaling to unit variance. The scaler is fit only on the training data to prevent data leakage from the test set, and then used to transform both the training and testing data.

# Feature Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

6. Initializing and Training the Logistic Regression Model

With the data prepared, we can now initialize and train our logistic regression model. We create an instance of the LogisticRegression class from sklearn.linear_model. For this example, we specify the solver="liblinear", which is a good choice for smaller datasets and binary classification, and set random_state for reproducibility. The max_iter parameter is increased to ensure the solver has enough iterations to converge. The model is then trained using the fit() method with the scaled training features (X_train_scaled) and the training target variable (y_train).

# Initialize and train the Logistic Regression model
log_reg_model = LogisticRegression(solver="liblinear", random_state=42, max_iter=1000)
log_reg_model.fit(X_train_scaled, y_train)

print("\n--- Model Training Complete ---")

7. Making Predictions

Once the model is trained, we can use it to make predictions on the test set (X_test_scaled). The predict() method returns the predicted class labels (0 or 1 in this case). We also use the predict_proba() method to obtain the predicted probabilities for each class. This provides the likelihood of an instance belonging to class 0 (benign) and class 1 (malignant).

# Make predictions on the test set
y_pred = log_reg_model.predict(X_test_scaled)
y_pred_proba = log_reg_model.predict_proba(X_test_scaled) # Get probabilities

print("\n--- Predictions Made ---")

8. Evaluating the Model

Model evaluation is crucial to understand how well our logistic regression model performs. We use several common metrics for classification tasks:

  • Accuracy: This is the proportion of correctly classified instances. It is calculated using accuracy_score.
  • Confusion Matrix: This table provides a detailed breakdown of correct and incorrect classifications for each class (True Positives, True Negatives, False Positives, False Negatives). It is generated using confusion_matrix.
  • Classification Report: This report, generated by classification_report, includes precision, recall, F1-score, and support for each class. These metrics provide a more nuanced view of performance, especially if the classes are imbalanced.
    • Precision measures the accuracy of positive predictions (TP / (TP + FP)).
    • Recall (or Sensitivity) measures the model’s ability to identify all actual positives (TP / (TP + FN)).
    • F1-score is the harmonic mean of precision and recall, providing a single score that balances both.
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"\n--- Model Evaluation ---")
print(f"Accuracy: {accuracy:.4f}")

conf_matrix = confusion_matrix(y_test, y_pred)
print(f"\nConfusion Matrix:\n{conf_matrix}")

class_report = classification_report(y_test, y_pred, target_names=cancer.target_names)
print(f"\nClassification Report:\n{class_report}")

The output of these evaluations will indicate the model’s effectiveness. For instance, a high accuracy and balanced precision/recall scores suggest good performance.

9. Interpreting Predicted Probabilities

To further understand the model’s output, we can look at the predicted probabilities for a few samples from the test set. This shows the model’s confidence in its predictions.

# Display some predicted probabilities for the first few test samples
print("\n--- Predicted Probabilities for first 5 test samples (Benign, Malignant) ---")
for i in range(5):
    print(f"Sample {i+1}: Actual={y_test.iloc[i]}, Predicted Proba={y_pred_proba[i]}, Predicted Class={y_pred[i]}")

Each row in y_pred_proba contains two probabilities: the first for class 0 (benign) and the second for class 1 (malignant). The predict() method typically assigns the class with the higher probability (usually based on a 0.5 threshold).

10. Interpreting Model Coefficients

Finally, we can examine the coefficients (weights) learned by the logistic regression model. These coefficients indicate the relationship between each feature and the log-odds of the outcome. A positive coefficient suggests that an increase in the feature’s value increases the log-odds of the outcome being class 1 (malignant), while a negative coefficient suggests the opposite. We can also exponentiate these coefficients to get odds ratios, which are often easier to interpret. An odds ratio greater than 1 means the odds of the outcome (malignant) increase with an increase in the feature, while an odds ratio less than 1 means the odds decrease.

# Interpreting Coefficients
coefficients = pd.DataFrame(log_reg_model.coef_[0], X.columns, columns=["Coefficient"])
print("\n--- Model Coefficients (Log-Odds) ---")
print(coefficients.sort_values(by="Coefficient", ascending=False))

odds_ratios = np.exp(log_reg_model.coef_[0])
odds_ratios_df = pd.DataFrame(odds_ratios, X.columns, columns=["Odds Ratio"])
print("\n--- Model Odds Ratios ---")
print(odds_ratios_df.sort_values(by="Odds Ratio", ascending=False))

This step provides insights into which features are most influential in the model’s predictions. It is important to remember that these interpretations are based on the scaled features if feature scaling was applied.

This worked example covers the end-to-end process of applying logistic regression, from data loading and preprocessing to model training, evaluation, and basic interpretation. The specific results (accuracy, coefficients, etc.) will depend on the dataset and the chosen parameters, but the methodology remains consistent.

# Python Worked Example for Logistic Regression

# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import load_breast_cancer # Using a built-in dataset for simplicity

# Load the dataset
# The breast cancer dataset is a classic binary classification dataset.
# Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass.
# They describe characteristics of the cell nuclei present in the image.
# The target variable is whether the mass is malignant (1) or benign (0).
cancer = load_breast_cancer()
df = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)
df["target"] = cancer.target

print("--- Dataset Head ---")
print(df.head())
print("\n--- Dataset Info ---")
df.info()
print("\n--- Target Value Counts ---")
print(df["target"].value_counts())

# Define features (X) and target (y)
X = df.drop("target", axis=1)
y = df["target"]

# Split the data into training and testing sets
# We use 80% of the data for training and 20% for testing.
# random_state is set for reproducibility.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print(f"\n--- Shape of Training Data ---")
print(f"X_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"--- Shape of Testing Data ---")
print(f"X_test shape: {X_test.shape}")
print(f"y_test shape: {y_test.shape}")

# Feature Scaling
# Logistic regression can benefit from feature scaling, especially when using solvers that are sensitive to feature magnitudes.
# StandardScaler standardizes features by removing the mean and scaling to unit variance.
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize and train the Logistic Regression model
# We use a simple logistic regression model with default parameters for this example.
# max_iter is increased to ensure convergence for some solvers.
log_reg_model = LogisticRegression(solver="liblinear", random_state=42, max_iter=1000)
log_reg_model.fit(X_train_scaled, y_train)

print("\n--- Model Training Complete ---")

# Make predictions on the test set
y_pred = log_reg_model.predict(X_test_scaled)
y_pred_proba = log_reg_model.predict_proba(X_test_scaled) # Get probabilities

print("\n--- Predictions Made ---")

# Evaluate the model
# Accuracy: The proportion of correctly classified instances.
accuracy = accuracy_score(y_test, y_pred)
print(f"\n--- Model Evaluation ---")
print(f"Accuracy: {accuracy:.4f}")

# Confusion Matrix: A table showing the performance of a classification model.
# Rows represent the actual classes, and columns represent the predicted classes.
# TN | FP
# FN | TP
conf_matrix = confusion_matrix(y_test, y_pred)
print(f"\nConfusion Matrix:\n{conf_matrix}")

# Classification Report: Provides precision, recall, F1-score, and support for each class.
# Precision: TP / (TP + FP) - Ability of the classifier not to label as positive a sample that is negative.
# Recall (Sensitivity): TP / (TP + FN) - Ability of the classifier to find all the positive samples.
# F1-score: 2 * (Precision * Recall) / (Precision + Recall) - Weighted average of Precision and Recall.
# Support: The number of actual occurrences of the class in the specified dataset.
class_report = classification_report(y_test, y_pred, target_names=cancer.target_names)
print(f"\nClassification Report:\n{class_report}")

# Display some predicted probabilities for the first few test samples
print("\n--- Predicted Probabilities for first 5 test samples (Benign, Malignant) ---")
for i in range(5):
    print(f"Sample {i+1}: Actual={y_test.iloc[i]}, Predicted Proba={y_pred_proba[i]}, Predicted Class={y_pred[i]}")

# Interpreting Coefficients (Optional, but good for understanding)
# The coefficients represent the change in the log-odds of the outcome for a one-unit increase in the predictor variable,
# holding other variables constant.
coefficients = pd.DataFrame(log_reg_model.coef_[0], X.columns, columns=["Coefficient"])
print("\n--- Model Coefficients (Log-Odds) ---")
print(coefficients.sort_values(by="Coefficient", ascending=False))

# To get odds ratios, we can exponentiate the coefficients
odds_ratios = np.exp(log_reg_model.coef_[0])
odds_ratios_df = pd.DataFrame(odds_ratios, X.columns, columns=["Odds Ratio"])
print("\n--- Model Odds Ratios ---")
print(odds_ratios_df.sort_values(by="Odds Ratio", ascending=False))

print("\n--- End of Worked Example ---")

References

  1. GeeksforGeeks. (2025, February 3). Logistic Regression in Machine Learning. GeeksforGeeks. Retrieved from https://www.geeksforgeeks.org/understanding-logistic-regression/
  2. Rai, K. (2020, June 14). The math behind Logistic Regression. Analytics Vidhya on Medium. Retrieved from https://medium.com/analytics-vidhya/the-math-behind-logistic-regression-c2f04ca27bca
  3. Wikipedia contributors. (2024, May 9). Logistic regression. Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Logistic_regression
  4. Scikit-learn developers. (n.d.). sklearn.linear_model.LogisticRegression. Scikit-learn. Retrieved from https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
  5. Scikit-learn developers. (n.d.). sklearn.datasets.load_breast_cancer. Scikit-learn. Retrieved from https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html
  6. Scikit-learn developers. (n.d.). sklearn.model_selection.train_test_split. Scikit-learn. Retrieved from https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
  7. Scikit-learn developers. (n.d.). sklearn.preprocessing.StandardScaler. Scikit-learn. Retrieved from https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
  8. Scikit-learn developers. (n.d.). sklearn.metrics module. Scikit-learn. Retrieved from https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
  9. Pandas development team. (n.d.). Pandas documentation. Pandas. Retrieved from https://pandas.pydata.org/pandas-docs/stable/
  10. NumPy developers. (n.d.). NumPy documentation. NumPy. Retrieved from https://numpy.org/doc/

Monday, May 5, 2025

I can’t believe they said that!

Comedians can tell truths others can’t

I heard something intriguing in a comedian’s podcast and it wasn’t what you might think. The host was interviewing a comedian and talking about her latest set. It was all about some very dark and harrowing things that had happened to her.  She’d managed to create a comedy set that enabled her to talk about those things and she explained how she’d structured her performance to do it. 

Although not as extreme, I’ve seen and heard comedians talk about some very difficult subjects. This isn’t new; famously, court jesters could speak truth to power and not be executed for it. The court jester appears in a modern form too. I’ve seen comedians at corporate events say some things that are very close to the bone and get away with it. 

(The Court Jester by John Watson Nicol, Public domain, via Wikimedia Commons)

This poses the question: how do comedians do it?

Trust and safety

On the podcast, the ‘harrowing’ comedian explained how she made the audience feel safe at the start of her act. The audience knew the subject matter would be difficult, but they had to trust her as their guide. She talked about how she did that: the jokes she told, her use of language, how she interacted with the audience, and so on. Only once the audience were in a position where they felt safe and they trusted her did she start on her more difficult journey.

The idea of safety also applies to the court jester and his or her modern counterparts. The jester will never be king, so they’re not a threat to the established order. In fact, the king is paying the jester, and of course, the payment could end at any time. Payments set limits on how far you can go, so the court jester knows to be concerned with audience safety too.

We can gain some insight into why audience safety is important through some of the theories of comedy.

Theories of comedy: benign violation

There’s very active research into what comedy is and why it appeals to us. Researchers have developed a multitude of theories that explain why we find different types of jokes funny, but there’s no accepted grand unified theory.

The comedy theory that’s most applicable to us here is Benign Violation theory. This theory says we find things funny that violate our expectations of reality in some way but only if they don’t feel threatening to us. Threatening can mean different things at different times, but it also gets to expectation. If I go to a stage play about a difficult subject, I might expect the play to make me cry for the characters. If I go to a comedy show, I want to laugh, not cry with empathy. In comedy, I have to feel safe with the comedian, meaning they’re not going to take me to bad emotional places.

Using this theory, we can understand how a comedian can structure an act about a time they were mugged. Let’s say there were some absurdities about the robbery itself. If the comedian talks about how awful they felt during and after the robbery and how it affected them, this is all very serious and not funny; the audience will empathize but not laugh, so it isn’t benign; the audience isn’t safe emotionally. The comedian has to remove the sting somehow which they could do by letting everyone know they were OK after the robbery. Once the audience knows it’s safe, the comedian can proceed and focus on the absurdities (AKA violations). 

If you want to see an extraordinary example of this for real, see Tig Notaro’s act about her breast cancer and double mastectomy. She places the audience in a position of safety and only then talks about what happened. She focuses on some absurdities of her experience and real life and not on the harrowing side of it, so again the audience feels safe (benign) while she talks about difficult things (violation). It’s OK to laugh, because she’s OK with it and she’s laughing with us. 

(Gage Skidmore from Peoria, AZ, United States of America, CC BY-SA 2.0 <https://creativecommons.org/licenses/by-sa/2.0>, via Wikimedia Commons)

Rule breaking: unsafe audiences 

This brings us to an interesting aspect of audience safety, audience interaction. 

I’ve seen comedians pick on people in the front rows and make fun of them. For example, make fun of their occupation or partner or where they’re from etc.  This goes to another theory of comedy, superiority theory, that says we laugh at the misfortune of others. If you’re not the person the comedian is picking on, it can be very funny, but if you are, it can be very threatening.

Think for a minute how the audience feels while the comedian is looking for a new target. There’s fear because some of the humor can cut deeply. Audiences know this and can be very wary. I’ve been to comedy acts where no one wants to sit near the stage and no one will volunteer anything to the comedian. The audience don’t feel safe doing so. 

Years ago, I went to see Eddie Izzard. He started his act asking the audience questions. No one answered. At the time, comedians were known to pick on audience members, so the audience didn’t feel safe. When finally someone did answer, he made fun of their home town. Later on in his act, Eddie Izzard commented about the audience’s English reserve and not interacting, but I think he was wrong and it was something else; they didn’t feel safe engaging with him because they didn’t want to be a target.

More recently, I was at a corporate event and there was a stand-up comedian. She said some very funny things about one of the c-level execs, it was cutting because it was true. When she asked for audience interaction, she got none because no-one wanted to be her next target.

Presentations and audience safety

Years ago, I was on a presentation training course. A nurse was presenting on a technical topic about the welfare of child patients. At one point, she seemed to get very upset at a memory and it was noticeable in her presentation. The class teacher called her out on it; she said that it didn't feel appropriate in the context of the presentation. By introducing strong emotion, she'd distracted the audience from her message. This seems harsh, but the class teacher was right.

Strong emotions are difficult for audiences to deal with, especially if they aren't expecting it. Strong emotions overwhelm everything else the presenter might say. This gets to audience emotional safety. My 'harrowing' comedian put a lot of effort into making her audience feel safe before discussing difficult subjects. Most presenters don't have anything like the skill level to do that, so they should stay away from expressing strong emotions.

Expectations and safety

There’s something that’s kind of obvious but hidden and that’s audience expectations and safety. If you go to see a late-night comedian after the pubs have shut, you might expect an expletive ridden show with all kinds of adult humor, and that’s OK because you know what it is. On the other hand, you have very different expectations for a comedian performing in front of 10-year-old children. Where the safety boundaries are varies depending on the audience.

In the case of my ‘harrowing’ comedian, she made it very clear in her show’s publicity material that her show contained very difficult material. On the Tig Notaro show I saw on TV, the channel made it clear it was an adult show covering difficult themes. In my view, this is responsible and also helps the audience to feel safe.

What all this means

As a presenter, if you want the audience to interact with you, they have to trust you. Don’t demean people who volunteer, it discourages everyone else. I suggest positivity. Let’s say an audience member tells you they come from a very run down town. You could riff on crime in that town, or you could tell a benign story about the town like losing your car in a huge parking lot there. Rewarding people for engaging with you encourages more engagement.

Audiences have to feel safe with you if you’re going to push any kind of boundary, and this is especially true if any of your material is difficult. You have to let your audience know that you’re OK and they’re OK, and they’ll be OK if they go on a journey with you; you’re going to make them laugh, not cry. 

Finally, you can speak truth to power through humor, but you need to know what you're doing and what the limits are.