kmeans clustering on the basis of fixed number of variables out of all variables

Question

kmeans clustering on the basis of fixed number of variables out of all variables

1.5k views Asked by amanized At 13 June 2015 at 11:18

I am beginner in R and data analysis.I have a data-set of around 2500 rows with 7 columns .I want to cluster the data-set with 15 centers but on the basis of just first two columns(keeping other columns intact with the clustered-data-set.

I also need to display the clustered data-set sorted on the basis of a third column.

Can someone help me with the required syntax ? let my csv file name be locdata.csv and first two columns be "lat" and "lon" and third column be "date".

Original Q&A

There are 1 answers

**MattV** · Answer 1 · 2015-06-13T12:51:12+00:00

This should help you get there.

First create the dataset (alternatively, import the csv file):

set.seed(1)
df <- data.frame(matrix(rnorm(n=10000, mean=10, sd=20), ncol=8))
names(df)[1:3] <- c("lat", "lon", "date")
# Use df <- read.csv(..) instead to load from a file

require(dplyr)
cluster.df <- select(df, lat, lon) # Select the columns to cluster on
km <- kmeans(cluster.df, 15)

Next you can extract the clusters, using the fact that the kmeans retains the original order:

# Extract the clusters and add them to original data frame
df$cluster = km$cluster

# Sort on whatever column you prefer
df %>%
  arrange(date, cluster)

TechQA.

kmeans clustering on the basis of fixed number of variables out of all variables

There are 1 answers

Related Questions in R

Related Questions in CSV

Related Questions in CLUSTER-ANALYSIS

Related Questions in HEATMAP

Related Questions in K-MEANS

Popular Questions

Popular Tags

Trending Questions