df= df.groupby("user", sort=False).apply(lambda x: list(x["rating"])).reset_index(name="rating")
numarr = userMovieRatingsDF["rating"].to_numpy()
def custom_distance(point1, point2):
return np.sum(np.abs(point1 - point2))
metric = distance_metric(type_metric.USER_DEFINED, func=custom_distance)
initial_centers = kmeans_plusplus_initializer(numarr, 2).initialize()
kmeans_instance = kmeans(numarr, initial_centers, metric=metric)
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
Dataframe's rating column is a list of np.int64 items.
Every list has the same amount of items.
This is the error that I get when the initial_centers = kmeans_plusplus_initializer(numarr, 2).initialize() runs:
ValueError: operands could not be broadcast together with shapes (397,) (22362,)