Installing the Kmeans PostgreSQL extension on Amazon RDS

1.7k views Asked by At

I take part in some Django poroject and we use geo data (with GeoDjango). I have installed PostGis as it described on AWS docs.

We have a lot of some points (markers) on the map. And we need to cluster them.

I found one library anycluster. This library need the PostgreSQL extension named kmeans-postgresql be installed on the Postgre database.

But my database is located on Amazon RDS. And I can't connect to it by SSH in order to install an extension...

Anybody knows how can I install kmeans-postgresql extension on my Amazon RDS database?

Or maybe you can advise me other ways of clustering?

2

There are 2 answers

1
Thiago Mata On BEST ANSWER

The K-Means It is a really complex calculation that is useful to data mining and cluster analysis ( you can see more about it in the wikipedia page https://en.wikipedia.org/wiki/K-means_clustering ). It have a big complexity when have to deal with many points. The K-means extension to postgresql http://pgxn.org/dist/kmeans/doc/kmeans.html it is written in C and compiled in the database machine. This brings a better performance compared to an procedure in plpgsql. Unfortunately as @estevao_lucas answered, this extension it is not enabled into Amazon RDS.

If you really need the k-means result, I translated this implementation of it, created by Joni Salonen in http://jonisalonen.com/2012/k-means-clustering-in-mysql/ and changed to plpgsql https://gist.github.com/thiagomata/a9737c3455d6248bef9f. This function uses temporary table. It is possible change it to use only arrays of Pins, if you wanna to.

But, if you only need to show some pins in a map, you will probably be happy with a really faster and simpler function that groups the results into an [x,y] matrix. I have created such function because the kmeans function was taking too much time to process my database (with a lot more than 400K elements). So this implementation is really faster, but does not have all the features you would expect from the K-means module. Besides that, this grid function https://gist.github.com/thiagomata/18ea14853998468c1a1d returns very good results, when the goal it is to show a big number of pins in a map. Example of Grid Result

0
Estevão Lucas On

You can just install supported extensions on Amazon RDS and Kmeans isn't it.

ERROR: Extension "kmeans" is not supported by Amazon RDS DETAIL: Installing the extension "kmeans" failed, because it is not on the list of extensions supported by Amazon RDS. HINT: Amazon RDS allows users with rds_superuser role to install supported extensions. See: SHOW rds.extensions; alexandria_development=> SHOW rds.extensions

RDS extensions:

btree_gin, btree_gist, chkpass, citext, cube, dblink, dict_int, dict_xsyn, earthdistance, fuzzystrmatch, hstore, intagg, intarray, isn, ltree, pgcrypto, pgrowlocks, pg_prewarm, pg_stat_statements, pg_trgm, plcoffee, plls, plperl, plpgsql, pltcl, plv8, postgis, postgis_tiger_geocoder, postgis_topology, postgres_fdw, sslinfo, tablefunc, test_parser, tsearch2, unaccent, uuid-ossp