Assign strings with integer values when order matters for clustering

50 views Asked by At

I have a network with many computer names. I would like to assign for each computer name an int value so I can cluster computer names that have close int values. Computer names within the same cluster should be computers which share the same prefix (the length of the prefix is NOT constant) and only differ in their suffix (the length of the suffix is NOT constant) will have relatively close values.

For example, suppose I have 3 computer names: 1. 'wber1637' 2. 'wbcx9999' 3. 'abcx9999'. The first and the second name have the same prefix (in this case the prefix has length 2 -'wb') , so I would like that they will be assigned with int values that are close to each other. In contrast, the third name that has a different prefix from the other two names (although having the same suffix, 'cx9999', as the second name) should be assigned with an int value that is far from the int values of the other two names.

1

There are 1 answers

0
phflack On

Treat the chars in the names as the numbers themselves

In this way names starting with similar beginnings will be about the same size, and if the endings differ then it will be a smaller change

For example:

wber1637's letters in ascii is 0x77, 0x62, 0x65, 0x72, 0x31, 0x36, 0x33, 0x37

Concat them into a number to get 0x7762657231363337 - 8602549779357381431 decimal

wbcx9999's letters in ascii is 0x77, 0x62, 0x63, 0x78, 0x39, 0x39, 0x39, 0x39

Concat them into a number to get 0x7762637839393939 - 8602547606238345529 decimal

These are somewhat near eachother (notice how they both start with 860254), in comparison to abcx9999

abcx9999's letters in ascii is 0x61, 0x62, 0x63, 0x78, 0x39, 0x39, 0x39, 0x39

Concat them into a number to get 0x6162637839393939 - 7017280537403930937 decimal


The difference between wbcx9999 and wber1637 is 2173119035902

The difference between wbcx9999 and abcx9999 is 1585267068834414592


In java this would be a simple task to generate

String name = "wber1637";
long output = 0; //note that an 8 digit string fits exactly into a long
for(char c : name.toCharArray())
    output = (output << 8) + c;