Read: 2725
In , we delve into the concept of hierarchical clustering, an essential technique used in unsupervised for data analysis. We will illustrate its application using Python's popular libraries such as numpy, matplotlib, scipy, and scikit-learn.
Hierarchical clustering is a method that seeks to build a hierarchy of clusters by successively merging or splitting them based on their similarity or dissimilarity. This process results in two primary types of hierarchical clustering:
Agglomerative Bottom-Up: Each data point starts as its own cluster, and the algorithm iteratively merges the closest prs until it forms a single large cluster.
Divisive Top-Down: All data points initially belong to one cluster which progressively breaks down into smaller clusters.
Let's load some sample data using numpy for demonstration purposes. For real-world applications, you might use pandas for loading datasets from various formats:
import numpy as np
# Sample dataset creation replace with your actual data
data = np.array3,2, 5,4, 10,20, 16, 30
We'll use the scipy
library to calculate distances between points. For hierarchical clustering in scikit-learn, we choose a linkage criterion:
from scipy.spatial.distance import pdist, squareform
# Calculate prwise distance using Euclidean metric
distances = pdistdata, 'euclidean'
Now, let's construct the hierarchical clustering tree using scipy.cluster.hierarchy
:
from scipy.cluster.hierarchy import drogram
# Convert distances to a square matrix for drogram plotting
Z = hierarchy.linkagedistances, 'ward' # Ward minimizes variance of clusters being merged
plt.figurefigsize=10, 7
dn = drogramZ
plt.show
To form actual clusters from this tree, we'll use scikit-learn
:
from sklearn.cluster import AgglomerativeClustering
# Define a specific number of clusters e.g., two clusters and fit the model
model = AgglomerativeClusteringn_clusters=2, affinity='euclidean', linkage='ward'
clusters = model.fit_predictdata
printCluster assignments:, clusters
Hierarchical clustering offers flexibility in choosing when to stop merging or splitting based on a specific criterion. This method is particularly useful for exploratory data analysis where the natural groupings within data need to be discovered without predefined labels.
By combining numpy, matplotlib, scipy, and scikit-learn, we can efficiently implement hierarchical clustering to uncover patterns in complex datasets. Whether you're ming to visualize relationships between species in biology, grouping customers for marketing strategies, or detecting anomalies in various applications, this technique provides a robust foundation for your data analysis toolkit.
This article is reproduced from: https://blooloop.com/technology/in-depth/immersive-art-experiences/
Please indicate when reprinting from: https://www.o063.com/Museum_Exhibition_Hall/Hierarchical_Clustering_in_Python_Tutorial.html
Python Hierarchical Clustering Guide Unsupervised Learning with Scikit learn Data Analysis Techniques: Dendrogram Visualization Agglomerative vs Divisive Clustering Methods Machine Learning Algorithm Implementation Overview Real world Data Handling for Clustering