How to identify Bimodal Histograms automatically?

4.8k views Asked by At

I have a problem where I receive a lot of histograms from some database images. Those histograms are represented as vectors (0...255), and I have to identify and work with the bimodal histograms.

Is there a formula to automatically identify which histograms are bimodal and which aren't? Since they are numeric vectors, I could use a programming language (Java/C#) to work with it.

Is there a criterion on literature to identify bimodal histograms by software?


Here are 3 examples of histograms and format inputs I'm working with. Each histogram is a vector with 256 (0...255) positions.

Histogram 1
8029, 41, 82, 177, 135, 255, 315, 591, 949, 456, 499, 688, 446, 733, 712, 1595, 2633, 3945, 6134, 9755, 9236, 11911, 11888, 9450, 13119, 8819, 5991, 4399, 6745, 2017, 3747, 1777, 2946, 1623, 2151, 454, 3015, 3176, 2211, 1080, 391, 580, 750, 473, 10424, 334, 559, 621, 340, 2794, 1094, 5274, 2822, 204, 389, 728, 268, 15, 1060, 58, 113, 2728, 52, 3166, 11, 103, 522, 107, 351, 97, 66, 565, 315, 444, 3305, 245, 647, 306, 147, 112, 103, 672, 69, 317, 61, 224, 71, 52, 479, 62, 106, 166, 215, 132, 137, 321, 998, 427, 846, 787, 542, 1054, 1429, 615, 697, 580, 642, 768, 1244, 462, 4107, 1701, 2394, 4954, 4869, 1841, 1807, 1032, 3075, 331, 488, 627, 1281, 233, 1010, 1178, 727, 830, 1619, 728, 1428, 1849, 4826, 351, 745, 320, 888, 335, 741, 1151, 734, 689, 2143, 1130, 2482, 3609, 4779, 5678, 4186, 2654, 1668, 1290, 702, 1093, 476, 438, 445, 271, 98, 368, 226, 90, 75, 26, 33, 62, 16, 824, 21, 37, 34, 24, 54, 42, 101, 112, 18, 24, 17, 15, 3, 50, 7, 6, 54, 3, 58, 9, 10, 66, 12, 11, 10, 6, 25, 11, 7, 172, 13, 18, 21, 9, 8, 9, 42, 16, 15, 6, 12, 17, 7, 591, 6, 7, 14, 24, 7, 7, 19, 87, 18, 8, 9, 9, 35, 55, 4, 17, 10, 18, 22, 46, 8, 852, 15, 14, 12, 11, 9, 3, 50, 163, 12, 4, 18, 129, 6, 35, 47, 14, 18, 150, 21, 46, 24, 0

Histogram 2
8082, 4857, 1494, 2530, 1604, 1636, 1651, 1681, 1630, 1667, 1636, 1649, 1934, 1775, 1701, 1691, 1478, 1649, 1449, 1449, 1503, 1475, 1497, 1398, 1509, 1747, 1301, 1539, 1575, 1496, 1754, 1432, 1759, 1786, 1679, 1816, 2435, 1174, 1780, 1344, 1749, 2026, 1779, 1742, 1722, 1835, 2306, 1662, 1965, 1885, 2212, 2139, 1930, 2306, 2707, 2289, 2307, 2082, 2360, 2216, 2480, 2243, 2222, 1824, 4555, 1918, 2116, 2275, 2615, 2240, 2703, 2481, 2626, 2708, 3008, 2696, 2561, 2906, 3625, 2419, 3137, 2793, 2747, 2861, 2774, 4124, 3155, 3243, 3523, 3432, 3277, 3456, 2984, 2902, 2819, 2778, 3158, 2997, 2591, 2717, 2553, 2464, 3657, 2296, 2352, 2046, 2124, 1965, 2014, 2096, 1664, 1373, 1607, 1322, 1272, 1113, 1156, 1055, 924, 881, 1019, 669, 929, 636, 590, 463, 524, 177, 1267, 378, 409, 413, 415, 435, 385, 379, 267, 413, 266, 282, 499, 194, 360, 199, 337, 92, 986, 183, 160, 230, 124, 213, 188, 334, 164, 159, 130, 143, 135, 331, 25, 118, 114, 98, 74, 301, 92, 119, 94, 72, 192, 38, 64, 100, 138, 30, 98, 65, 226, 23, 46, 78, 78, 61, 55, 234, 26, 36, 95, 31, 49, 214, 25, 34, 58, 37, 101, 20, 41, 34, 150, 16, 50, 25, 53, 18, 30, 67, 27, 36, 42, 23, 60, 12, 21, 36, 12, 45, 21, 58, 53, 18, 51, 16, 25, 9, 24, 15, 18, 30, 33, 20, 19, 12, 23, 16, 14, 21, 14, 10, 20, 13, 12, 9, 6, 9, 7, 10, 7, 2, 0, 0, 0, 0, 0, 2087

Histogram 3
50, 226, 857, 2018, 1810, 1795, 1840, 1929, 1942, 1693, 1699, 1547, 1564, 1556, 1451, 1439, 1448, 1357, 1428, 1419, 1383, 1705, 1670, 1777, 1826, 1865, 1897, 1924, 2003, 1973, 1813, 1801, 1827, 1696, 1717, 1654, 1678, 1705, 1621, 1523, 1494, 1559, 1434, 1370, 1358, 1385, 1348, 1380, 1368, 1367, 1389, 1445, 1514, 1471, 1465, 1461, 1475, 1484, 1390, 1403, 1324, 1339, 1426, 1432, 1487, 1460, 1469, 1460, 1546, 1504, 1425, 1373, 1391, 1391, 1382, 1311, 1368, 1354, 1325, 1323, 1263, 1325, 1363, 1357, 1325, 1322, 1429, 1419, 1412, 1371, 1266, 1179, 1166, 1076, 1100, 1083, 1103, 1053, 1116, 1080, 1071, 1025, 1088, 1060, 1011, 984, 958, 959, 954, 937, 982, 950, 1001, 963, 965, 875, 1010, 954, 990, 894, 959, 972, 963, 1101, 971, 1042, 1064, 1075, 1029, 1088, 1090, 1068, 1073, 1058, 1102, 1105, 1009, 1062, 1005, 1048, 973, 998, 1034, 1013, 961, 1006, 983, 948, 1031, 972, 952, 1013, 954, 964, 970, 881, 887, 967, 941, 928, 994, 1019, 1106, 1056, 1113, 1071, 1158, 1108, 1178, 1071, 1080, 1074, 1050, 1076, 1106, 1048, 973, 1042, 997, 1034, 934, 863, 935, 845, 839, 803, 764, 782, 787, 771, 766, 751, 745, 804, 789, 765, 681, 658, 690, 672, 650, 635, 695, 619, 572, 499, 535, 565, 564, 520, 516, 568, 530, 479, 507, 424, 446, 455, 380, 395, 371, 360, 391, 373, 351, 388, 426, 349, 417, 421, 400, 443, 470, 485, 456, 495, 452, 484, 457, 518, 519, 631, 652, 693, 762, 771, 807, 906, 991, 1138, 1433, 1545, 2467, 4907, 6743, 1921
2

There are 2 answers

8
Spektre On BEST ANSWER
  1. smooth histogram

    this filter out small local min max and noise. Use symmetric smoothing to avoid shifting to one side. I smooth from left then from the right which lower the shifting a lot.

  2. find/count the local max peaks

    Count only big enough peaks (by some treshold). If peak count is not 2 then it is not a bimodal histogram unless you have different definition of bimodal like:

    • noise count too
    • no matter how many peaks but single gap must be present

    It depends on what for the histograms are used

Here is some code in C++ I busted for this:

void histograms(Graphics::TBitmap *bmp,int xs,int ys,int **pyx)
    {
    // clear buffer
    bmp->Canvas->Brush->Color=clBlack;
    bmp->Canvas->FillRect(TRect(0,0,xs,ys));
    bmp->Canvas->Font->Color=clAqua;
    bmp->Canvas->TextOutA(  5,5,"Raw histogram");
    bmp->Canvas->TextOutA(285,5,"Smoothed histogram");

    int his1[256]={ 8029, 41, 82, 177, 135, 255, 315, 591, 949, 456, 499, 688, 446, 733, 712, 1595, 2633, 3945, 6134, 9755, 9236, 11911, 11888, 9450, 13119, 8819, 5991, 4399, 6745, 2017, 3747, 1777, 2946, 1623, 2151, 454, 3015, 3176, 2211, 1080, 391, 580, 750, 473, 10424, 334, 559, 621, 340, 2794, 1094, 5274, 2822, 204, 389, 728, 268, 15, 1060, 58, 113, 2728, 52, 3166, 11, 103, 522, 107, 351, 97, 66, 565, 315, 444, 3305, 245, 647, 306, 147, 112, 103, 672, 69, 317, 61, 224, 71, 52, 479, 62, 106, 166, 215, 132, 137, 321, 998, 427, 846, 787, 542, 1054, 1429, 615, 697, 580, 642, 768, 1244, 462, 4107, 1701, 2394, 4954, 4869, 1841, 1807, 1032, 3075, 331, 488, 627, 1281, 233, 1010, 1178, 727, 830, 1619, 728, 1428, 1849, 4826, 351, 745, 320, 888, 335, 741, 1151, 734, 689, 2143, 1130, 2482, 3609, 4779, 5678, 4186, 2654, 1668, 1290, 702, 1093, 476, 438, 445, 271, 98, 368, 226, 90, 75, 26, 33, 62, 16, 824, 21, 37, 34, 24, 54, 42, 101, 112, 18, 24, 17, 15, 3, 50, 7, 6, 54, 3, 58, 9, 10, 66, 12, 11, 10, 6, 25, 11, 7, 172, 13, 18, 21, 9, 8, 9, 42, 16, 15, 6, 12, 17, 7, 591, 6, 7, 14, 24, 7, 7, 19, 87, 18, 8, 9, 9, 35, 55, 4, 17, 10, 18, 22, 46, 8, 852, 15, 14, 12, 11, 9, 3, 50, 163, 12, 4, 18, 129, 6, 35, 47, 14, 18, 150, 21, 46, 24, 0 };
    int his2[256]={ 8082, 4857, 1494, 2530, 1604, 1636, 1651, 1681, 1630, 1667, 1636, 1649, 1934, 1775, 1701, 1691, 1478, 1649, 1449, 1449, 1503, 1475, 1497, 1398, 1509, 1747, 1301, 1539, 1575, 1496, 1754, 1432, 1759, 1786, 1679, 1816, 2435, 1174, 1780, 1344, 1749, 2026, 1779, 1742, 1722, 1835, 2306, 1662, 1965, 1885, 2212, 2139, 1930, 2306, 2707, 2289, 2307, 2082, 2360, 2216, 2480, 2243, 2222, 1824, 4555, 1918, 2116, 2275, 2615, 2240, 2703, 2481, 2626, 2708, 3008, 2696, 2561, 2906, 3625, 2419, 3137, 2793, 2747, 2861, 2774, 4124, 3155, 3243, 3523, 3432, 3277, 3456, 2984, 2902, 2819, 2778, 3158, 2997, 2591, 2717, 2553, 2464, 3657, 2296, 2352, 2046, 2124, 1965, 2014, 2096, 1664, 1373, 1607, 1322, 1272, 1113, 1156, 1055, 924, 881, 1019, 669, 929, 636, 590, 463, 524, 177, 1267, 378, 409, 413, 415, 435, 385, 379, 267, 413, 266, 282, 499, 194, 360, 199, 337, 92, 986, 183, 160, 230, 124, 213, 188, 334, 164, 159, 130, 143, 135, 331, 25, 118, 114, 98, 74, 301, 92, 119, 94, 72, 192, 38, 64, 100, 138, 30, 98, 65, 226, 23, 46, 78, 78, 61, 55, 234, 26, 36, 95, 31, 49, 214, 25, 34, 58, 37, 101, 20, 41, 34, 150, 16, 50, 25, 53, 18, 30, 67, 27, 36, 42, 23, 60, 12, 21, 36, 12, 45, 21, 58, 53, 18, 51, 16, 25, 9, 24, 15, 18, 30, 33, 20, 19, 12, 23, 16, 14, 21, 14, 10, 20, 13, 12, 9, 6, 9, 7, 10, 7, 2, 0, 0, 0, 0, 0, 2087 };
    int his3[256]={ 50, 226, 857, 2018, 1810, 1795, 1840, 1929, 1942, 1693, 1699, 1547, 1564, 1556, 1451, 1439, 1448, 1357, 1428, 1419, 1383, 1705, 1670, 1777, 1826, 1865, 1897, 1924, 2003, 1973, 1813, 1801, 1827, 1696, 1717, 1654, 1678, 1705, 1621, 1523, 1494, 1559, 1434, 1370, 1358, 1385, 1348, 1380, 1368, 1367, 1389, 1445, 1514, 1471, 1465, 1461, 1475, 1484, 1390, 1403, 1324, 1339, 1426, 1432, 1487, 1460, 1469, 1460, 1546, 1504, 1425, 1373, 1391, 1391, 1382, 1311, 1368, 1354, 1325, 1323, 1263, 1325, 1363, 1357, 1325, 1322, 1429, 1419, 1412, 1371, 1266, 1179, 1166, 1076, 1100, 1083, 1103, 1053, 1116, 1080, 1071, 1025, 1088, 1060, 1011, 984, 958, 959, 954, 937, 982, 950, 1001, 963, 965, 875, 1010, 954, 990, 894, 959, 972, 963, 1101, 971, 1042, 1064, 1075, 1029, 1088, 1090, 1068, 1073, 1058, 1102, 1105, 1009, 1062, 1005, 1048, 973, 998, 1034, 1013, 961, 1006, 983, 948, 1031, 972, 952, 1013, 954, 964, 970, 881, 887, 967, 941, 928, 994, 1019, 1106, 1056, 1113, 1071, 1158, 1108, 1178, 1071, 1080, 1074, 1050, 1076, 1106, 1048, 973, 1042, 997, 1034, 934, 863, 935, 845, 839, 803, 764, 782, 787, 771, 766, 751, 745, 804, 789, 765, 681, 658, 690, 672, 650, 635, 695, 619, 572, 499, 535, 565, 564, 520, 516, 568, 530, 479, 507, 424, 446, 455, 380, 395, 371, 360, 391, 373, 351, 388, 426, 349, 417, 421, 400, 443, 470, 485, 456, 495, 452, 484, 457, 518, 519, 631, 652, 693, 762, 771, 807, 906, 991, 1138, 1433, 1545, 2467, 4907, 6743, 1921 };
    int *his,tmp[256],a,x0,y0,x,y,h,tr=12,sm=10,peak[256],peaks;
    // loop through histograms
    for (y0=20,h=0;;h++)
        {
        x0=5;if (h==0) his=his1;
        else if (h==1) his=his2;
        else if (h==2) his=his2;
        else break;
        // rescale his <0,?> to tmp <0-100>
        for (y=his[0],x=0;x<256;x++) if (y<his[x]) y=his[x];    // y=max
        for (         x=0;x<256;x++) tmp[x]=(his[x]*100)/y;
        // draw tmp
        for (x=0;x<256;x++) for (pyx[y0+100][x0+x]=0x00004040,y=0;y<tmp[x];y++) pyx[y0+100-y][x0+x]=(40+x)*0x00010101;
        x0+=280;
        // smooth tmp few times
        for (y=0;y<sm;y++)
            {
            // from both directions to avoid shifting to one side
            for (x=0;x<255;x++) tmp[x]=((90*tmp[x])+(10*tmp[x+1]))/100;
            for (x=255;x>0;x--) tmp[x]=((90*tmp[x])+(10*tmp[x-1]))/100;
            }
        // find (count) peaks
        for (peaks=0,a=0,y=0,x=0;x<255;x++)
            {
                 if (tmp[x]<tmp[x+1]){ if ((y< 0)&&(a-tmp[x]>tr)){ a=tmp[x]; }                         y=+1; }
            else if (tmp[x]>tmp[x+1]){ if ((y>=0)&&(tmp[x]-a>tr)){ a=tmp[x]; peak[peaks]=x; peaks++; } y=-1; }
            }
        // draw tmp
        for (x=0;x<256;x++) for (pyx[y0+100][x0+x]=0x00004040,y=0;y<tmp[x];y++) pyx[y0+100-y][x0+x]=(40+x)*0x00010101;
        // draw peaks
        bmp->Canvas->Pen->Color=clAqua;
        bmp->Canvas->Brush->Color=clAqua;
        for (a=0;a<peaks;a++)
            {
            x=x0+peak[a];
            y=y0+100-tmp[peak[a]];
            bmp->Canvas->Ellipse(x-5,y-5,x+5,y+5);
            }
        // draw cross for not bimodal histograms
        if (peaks!=2)
            {
            bmp->Canvas->Pen->Color=clRed;
            bmp->Canvas->MoveTo(x0    ,y0    );
            bmp->Canvas->LineTo(x0+256,y0+100);
            bmp->Canvas->MoveTo(x0+256,y0    );
            bmp->Canvas->LineTo(x0    ,y0+100);
            }
        y0+=128;
        }
    }
  • you can ignore the pyx[y][x] and bmp-> stuff it is just rendering
  • pyx[y][x] is direct 32bit pixel access of bitmap bmp
  • and bmp->Canvas is VCL encapsulated Windows GDI interface of bitmap bmp
  • xs,ys is bitmap resolution
  • set treshold tr and smooth sm to suite your needs best

If you have too much different types of histogram then you need to apply dynamic tresholding or different approach for peak finding this is how it looks like for your histograms:

example

Where Histogram 1 is the top one. Hope the code is clear enough if not comment me... if you rescale to power of 2 instead of 100 then you can change the multiplications and divisions to bit shifts to speed this a bit. I choose 100 for more clear selection of tresholds and smoothing coefficients...

0
Andres On

I don't think there is a simple and straightforward solution for that.

If you think each peak in your histogram as a cluster, you can try to implement some sort of clustering algorithm that is able to automatically detect the number of peaks in your histogram.

You can start looking at here:

Determining the number of clusters in a data set

Kmeans without knowing the number of clusters?

AUTOMATIC DETERMINATION OF THE NUMBER OF CLUSTERS USING SPECTRAL ALGORITHMS