Home:ALL Converter>Naming clusters in R

Naming clusters in R

Ask Time:2021-09-22T05:56:41         Author:Jason

Json Formatter

I am using the iris dataset in R. I clustered the data using K-means; the output is the variable km.out. However, I cannot find an easy way to assign the cluster numbers (1-3) to a species (versicolor, setosa, virginica). I created a manual way to do it but I have to set the seed and it's very manual. There has to be a better way to do it. Any thoughts?

Here is what I did manually:

for (i in 1:length(km.out$cluster)) {
  if (km.out$cluster[i] == 1) {
    km.out$cluster[i] = "versicolor"
for (i in 1:length(km.out$cluster)) {
  if (km.out$cluster[i] == 2) {
    km.out$cluster[i] = "setosa"
for (i in 1:length(km.out$cluster)) {
  if (km.out$cluster[i] == 3) {
    km.out$cluster[i] = "virginica"

Author:Jason,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/69275964/naming-clusters-in-r
Rui Barradas :

R is a vectorized language, the following one-liner is equivalent to the code in the question.\nkm.out$cluster <- c("versicolor", "setosa", "virginica")[km.out$cluster]\n",
dcarlson :

It is not clear what you are trying to accomplish. The clusters created by kmeans will not match the Species exactly and there is no guarantee that clusters 1, 2, 3 will match the order of the species in iris. Also as you noted, the results will vary depending on the value of the seed. For example,\nset.seed(42)\niris.km <- kmeans(scale(iris[, -5]), 3)\ntable(iris.km$cluster, iris$Species)\n# \n# setosa versicolor virginica\n# 1 50 0 0\n# 2 0 39 14\n# 3 0 11 36\n\nCluster 1 is exactly associated with setosa, but cluster 2 combines versicolor and virginica as does cluster 3.",
huttoncp :

You can recode the cluster number and add it back to the original data with:\nlibrary(dplyr)\nmutate(iris, \n cluster = case_when(km.out$cluster == 1 ~ "versicolor",\n km.out$cluster == 2 ~ "setosa",\n km.out$cluster == 3 ~ "virginica"))\n\nAlternatively you can use a vector translation approach to recoding a vector with elucidate::translate()\nremotes::install_github("bcgov/elucidate") #if elucidate isn't installed yet\nlibrary(dplyr)\nlibrary(elucidate)\n\nmutate(iris, \n cluster = translate(km.out$cluster, \n old = c(1:3), \n new = c("versicolor", \n "setosa", \n "virginica")))\n",