Interpreting a Self Organizing Map
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
A Self Organizing Map (SOM), introduced by Teuvo Kohonen in the 1980s, is a type of unsupervised learning algorithm used primarily for dimensionality reduction, data visualization, and clustering. SOMs are particularly beneficial for visualizing high-dimensional data in a two-dimensional grid that preserves the topological properties of the original data.
What is a Self Organizing Map?
A SOM consists of a number of units or neurons that are usually organized in a 2D grid. Each unit has a weight vector of the same dimensionality as the input data. The map "organizes" itself by learning the data structure through an iterative process. The primary goal is to project high-dimensional data onto a low-dimensional map while maintaining the topology.
The SOM Algorithm
- Initialization: Each neuron weight vector is initialized, often randomly or using a linear initialization method wherein the weights are distributed along the two largest principal components of the data.
- Iteration Process: • Select Input: Randomly select an input vector from the training dataset. • Best Matching Unit (BMU) Identification: Calculate the distance between the input vector and each neuron’s weight vector. The neuron with the smallest distance is chosen as the BMU.
• Updating Neurons: Update the weight vectors of the BMU and its neighboring neurons to be closer to the input vector. The update is performed with the formula:
where is the neighborhood function that decreases over time and determines the influence of the update.
- Cooling: Reduce the learning rate and the neighborhood radius gradually over time to fine-tune the map.
Interpreting a SOM
Interpreting a SOM involves analyzing the pattern of the weights across the neurons on the grid. Here's a step-by-step guide for interpreting:
Visualizing Clusters
One of the key benefits of SOMs is their ability to cluster similar data points. These clusters can be visualized using a U-Matrix or Unified Distance Matrix, which displays the Euclidean distance between neighboring nodes. Areas of high distances indicate boundaries between clusters.
Map Labels
If labeled data is available, it can be overlaid on the map to see which neurons are responsible for which classes. This helps the user interpret the map's representation and validation against known categories.
Component Planes
Component planes visualize the value of a particular dimension across different neurons. By analyzing component planes, you can study how different features contribute to the clustering outcome.
Dense Versus Sparse Regions
Regions densely populated with data points may be of more interest, indicating majority data characteristics. Sparse regions might indicate outliers or rare data points.
Example
Imagine a dataset containing information about different types of wines characterized by features such as acidity, sweetness, and alcohol content. A SOM may reveal that wines naturally group into clusters based on similarities in these characteristics (e.g., sweet versus dry wines), providing insights that might not be apparent in the original high-dimensional space.
Summary Table
| Key Concept | Explanation |
| Initialization | Random or linear distribution of weights. |
| BMU | The closest neuron to the input data point. |
| Updating Rule | Adjusts weights of the BMU and neighbors to reduce distance from the input. |
| Visualization Method | U-Matrix for cluster visualization; labels to map known classes. |
| Feature Interpretation | Component planes for individual feature impact. |
| High-Dimensional Data | Projects onto a 2D space while preserving topological properties. |
Conclusion
Self Organizing Maps are a powerful tool for visualizing and interpreting complex datasets. By using these maps, one can derive insights into natural clustering and relationships between high-dimensional data, thus enhancing understanding and inform subsequent data-driven decisions.

