The Confusion Matrix
A confusion matrix is a table that lays out how well a model performs, by comparing what it predicted against what the ground truth says is actually true. It gives you an indication of the quality of the model, and some hard statistics on how well you can trust its predictions.
The confusion matrix organizes every prediction into a simple grid:
- the rows represents the actual classes (what you know to be true on the ground) while
- the columns represent the predicted classes (what the model produced).
For a two-class problem, say, distinguishing "forest" from "not forest", this yields four cells:
- True Positives (the model predicted forest and it really was forest),
- True Negatives (predicted not-forest, and correctly so),
- False Positives (predicted forest where there was none), and
- False Negatives (missed actual forest by labelling it something else).

To make this concrete, suppose you validate your map against 100 known pixel locations. The matrix might show 40 True Positives, 45 True Negatives, 5 False Positives, and 10 False Negatives. From these four numbers alone, Spheer derives the accuracy of the model, which tells you the overall hit rate and is calculated as
- (True Positives + True Negatives) / Total Predictions, here (40 + 45) / 100 = 85%.
Accuracy gives an indication of how often the model is correct overall, the proportion of all predictions across every class that match the ground truth, meaning what you know to be there on the ground, independent of any prediction. It answers the broadest possible question: out of everything the model predicted, how much did it get right? In the example above, with 40 True Positives and 45 True Negatives out of 100 locations, accuracy is 85%, meaning the model agreed with reality in 85 out of every 100 cases.
However, just using accuracy can be misleading when classes are imbalanced. Consider mapping a rare land-cover type that occupies only a tiny fraction of your area: a model that simply predicts "everything else" everywhere could score very high accuracy while catching none of the class you actually care about. This is precisely why the confusion matrix is so valuable: it forces you to look beyond a single headline number.
How you should use a confusion matrix depends entirely on what a mistake costs you in your specific context. If you are mapping rare vegetation zones, a False Negative, labelling a potential habitat as not-possible, may carry far heavier consequences than a False Positive.
The confusion matrix is therefore not just a report card but a decision-making tool: it reveals where and how your model fails, lets you weigh the real-world consequences of each type of error, and guides you toward better placement of observations and the trade-offs that fit your problem. Reading it well is the difference between trusting a Spheer map blindly and understanding exactly what you are trusting it to do.
The Confusion Matrix inside the Spheer App
Inside the Spheer app, the confusion matrix is built around a clear separation between the data you use to train your model and the data you use to validate it. In the train layers you add observations that are used for training the model. To measure how good those predictions actually are, Spheer lets you create validation layers. The observations within these layers we call validation shapes, and they are how you hand Spheer your ground truth observations. Ground truth is simply the correct answer at a location, established independently of the model: data which you collected or has been given to you of that area. Unlike train shapes, validation shapes never teach the model;
they describe what the answer should be at a set of locations you have set aside for checking.
Spheer takes the prediction generated from your train layer's observations and compares it, location by location, against the validation shapes you have drawn. Each comparison falls into one of the cells of the confusion matrix, and from the full set of comparisons the accuracy is calculated. Because the validation shapes were never used in training, this gives you an honest, independent picture of how the model performs on data it has not seen. It is really important that these validation shapes are drawn correctly, as it would give a false indication of the model performance if not done so.
Spheer produces a single confusion matrix per validation layer. By organizing your validation shapes into different layers, you can evaluate your model from different perspectives. One validation layer might cover a particular region or year, another might focus on a specific class or a harder set of edge cases. This lets you pinpoint not just whether the model is performing well overall, but where and under what conditions it succeeds or struggles.
For a classification model, each class wil be seen back as a class in the confusion matrix. For a regression model, where the output is a continuous regression value rather than a discrete class, the same logic of comparing prediction against truth applies, but the values are grouped into bins. For example, a bin of 0%-10% holds all values that fall between 0% and 10% .
The practical workflow is, as always, iterative:
- draw your observations in a train layer,
- set aside ground truth examples in one or more validation layers,
- read the confusion matrix each validation layer produces, and
- use what it reveals to refine your observations, adding, moving, or adjusting train shapes, until the matrix tells you the map can be trusted.
Outside the App: Precision and Recall
For simplicity's sake, Spheer reports only accuracy. But the same four numbers in the matrix let you calculate two more focused metrics, should you ever want to reason about a single class in more detail:
- Precision: of everything predicted as a class, how much was correct: True Positive/ (True Positives + False Positives, here 40 / (40 + 5) = 89%, and
- Recall: of all the real instances of that class, how many did the model catch: True Positives / (True Positives + False Negatives), here 40 / (40 + 10) = 80%.
Where accuracy judges the model across every class at once, precision and recall zoom in on one.
Precision asks how trustworthy a positive prediction is: when the model says "forest," how often is it right?
Recall asks how complete the model is: of all the real forest out there, how much did it actually find?
The two usually trade off against each other: pushing the model to catch every last instance (high recall) tends to let in more false alarms (lower precision), while tightening it to avoid mistakes (high precision) tends to let it miss more real cases (lower recall). Which one you favour comes back to what an error costs you: exactly the potential habitat trade-off described earlier.
Help with Confusion Matrices
If you need help understanding the confusion matrices, have someone guide you through the process of setting it up or on any other topic in Spheer, please reach out to our support team at support@spheer.ai


