gnuplot rowstacked histogram: how to calculate and put standard deviations on top of bars?

776 views Asked by At

I'm asking as a new question because the update in How to plot histograms from rows in data file gnuplot didn't receive much attention. Related questions: gnuplot histogram: How to put values on top of bars and gnuplot rowstacked histogram: how to put sum above bars

t.dat file looks like this:

          260.37
          260.04
          261.53
          261.32
          260.19
          260.49
          260.43
          260.59
          260.26
          260.68
          260.28
          259.93
          260.82
          259.50
          260.29
          260.52
          260.30
          259.91
          262.24
          260.58
          260.74
          260.22
          261.66
          260.31
          260.99
          259.79
          260.90
          259.88
          260.19
          261.50
          259.32
          260.79
          259.94
          260.35
          260.03
          260.07
          261.86
          261.09
          260.60
          260.15
           75.17
           75.16
           75.33
           75.31
           75.34
           75.04
           75.49
           75.25
           75.27
           75.32
           75.10
           75.75
           75.58
           74.86
           75.19
           75.44
           75.29
           75.31
           75.55
           75.91
           75.39
           75.65
           75.85
           75.67
           75.62
           74.87
           75.64
           75.69
           75.13
           77.76
           75.31
           74.87
           75.75
           75.27
           75.61
           74.84
           75.72
           75.40
           74.96
           75.33
           67.20
           67.26
           68.15
           68.67
           68.88
           67.56
           67.71
           66.87
           68.74
           67.32
           66.92
           69.62
           67.29
           66.87
           68.33
           67.73
           68.66
           68.75
           67.00
           67.22
           66.93
           68.81
           67.29
           67.18
           67.33
           67.91
           70.34
           67.15
           68.37
           69.60
           69.74
           69.62
           67.33
           66.79
           67.90
           67.39
           69.88
           68.48
           68.96
           67.36
           47.82
           47.54
           47.74
           47.95
           47.65
           47.71
           47.64
           47.71
           47.47
           48.19
           47.82
           48.06
           47.88
           48.22
           48.31
           47.58
           47.41
           47.85
           47.71
           47.93
           48.34
           47.95
           48.70
           47.58
           47.86
           47.96
           47.80
           48.00
           47.51
           47.56
           47.50
           47.52
           47.47
           47.76
           47.53
           48.27
           47.26
           47.79
           47.67
           47.57

The goal is to calculate the standard deviations of four groups of numbers in file t.dat and show them as labels above the bars that compose the histogram. 1st group: lines 1-40 2nd group: lines 41-80 3rd group: lines 81-120 4th group: lines 121-160

So far, I've only managed to do this manually. I've splitted the desired range of lines into individual files, i. e. tempos1.dat, tempos4.dat, tempos9.dat and tempos16.dat, calculated their standard deviations using stats and showed them as labels.

Here's the code:

set term pngcairo
set out 'st-dev.png'
unset key
stats "tempos1.dat" using 1 name "A"
stats "tempos4.dat" using 1 name "B"
stats "tempos9.dat" using 1 name "C"
stats "tempos16.dat" using 1 name "D"
set label "0.6234" at 28,275
set label "0.4666" at 90,90
set label "0.9836" at 149,85
set label "0.2947" at 210,65
set boxwidth 0.9 relative
set style data histogram
set style histogram cluster gap 1
set style fill solid 1.0 border -1
set xrange [0:250]
set xtics ("1" 40, "4" 100, "8" 160, "16" 220)
plot for [i=1:4] 't.dat' using ($0+20+(i-1)*61):1 every ::((i-1)*40)::(i*40-1) with boxes lt i

I read the related posts, it appears that, to put the std dev above the bars all I have to do is add a column in the using clause that contains the variable that stores the std_dev, but for now it is being calculated over a manually splitted file. Here's the desired output: enter image description here

How do I do that?

1

There are 1 answers

2
Christoph On BEST ANSWER

You can iterate over the four blocks with do for, call stats using the same using and every settings as you already have in your plot command. Then, in each iteration, and set the appropriate label using the many values that stats gives you:

  • STATS_mean_x as x-coordinate of the label
  • STATS_max_y as y-coordinate (plus an offset of 1 character height, done with set label ... offset 0,1)
  • STATS_stddev_y as the calculated standard deviation itself:
set term pngcairo
set out 'st-dev.png'
unset key
set boxwidth 0.9 relative
set style fill solid 1.0 border -1
set xlabel "Número de processos"
set ylabel "Tempo de execução (s)"
set xrange [0:250]
set xtics ("1" 40, "4" 100, "8" 160, "16" 220)

do for [i=1:4] {
    stats 't.dat' using ($0+20+(i-1)*61):1 every ::((i-1)*40)::(i*40-1) nooutput
    set label i sprintf('%.4f', STATS_stddev_y) center at STATS_mean_x,STATS_max_y offset 0,1
}

plot for [i=1:4] 't.dat' using ($0+20+(i-1)*61):1 every ::((i-1)*40)::(i*40-1) with boxes lt i

enter image description here