Gnuplot complain with ')' expected when array is not defined at specific line

59 views Asked by At

I am using a gnuplot script that yields an error when the last line of the script is executed. I don't see any issue with this line. The error is: ')' expected. If a move the line array M_x_N[numRecord] before the line
stats N using (M_x_N[int($0+1)] = N[int($0+1)]*M[int($0+1)]) name "M_x_N" nooutput
there is no error. What wrong?

I am using the QT terminal, gnuplot 5.4 patch level 8. OS is Windows 10.

The script is:

reset session
set encoding utf8
set datafile separator comma
cd 'C:\Users\smallz4'
corpusFile = "lz4_silicia_corpus.txt_4096.csv"
stats corpusFile nooutput
numRecord = STATS_records
chunkSize = numRecord-15.0
bias = 2048.0

array M[numRecord]
array N[numRecord]
array M_x_N[numRecord]

stats corpusFile using (M[int($0+1)] = $1) name "M" nooutput
stats corpusFile using (N[int($0+1)] = $2) name "N" nooutput

stats N using (M_x_N[int($0+1)] = N[int($0+1)]*M[int($0+1)]) name "M_x_N" nooutput
2

There are 2 answers

3
theozh On BEST ANSWER

Ok, I was struggling a while to find out what's going on. For a reason which I don't yet understand, it seems the name of your final array causes the error.

help variables says:

Valid names are the same as in most programming languages: they must begin with a letter, but subsequent characters may be letters, digits, or "_".

So, M_x_N should be a valid variable or array name. At least, I haven't yet found a statement that says _ is not allowed for array names.

With the following minimal, complete example, I can reproduce your error. And now, I think I understand what you mean. Apparently, the sequence of the array definitions and the stats commands are important.

Script: (this script will fail with your error)

### strange behaviour with setting arrays and using stats
reset session

$Data <<EOD
 1   10
 2   20
 3   30
 4   40
EOD

stats $Data u (c=$0+1) nooutput   # get the number of rows into c

array M[c]
array M_x_N[c]

stats $Data u (M[int($0+1)] = $1) name "M" nooutput
stats $Data u (M_x_N[int($0+1)] = $2) name "M_x_N" nooutput
### end of script

However, if you change the sequence to the following, it will work.

array M[c]
stats $Data u (M[int($0+1)] = $1) name "M" nooutput

array M_x_N[c]
stats $Data u (M_x_N[int($0+1)] = $2) name "M_x_N" nooutput

Another version with the original sequence, i.e. all array definitions first and then all stats commands will work as expected, if you change the name of the array M_x_N to MxN.

array M[c]
array MxN[c]

stats $Data u (M[int($0+1)] = $1) name "M" nooutput
stats $Data u (MxN[int($0+1)] = $2) name "MxN" nooutput

So, my conclusion would be that the underscores in M_x_N somehow mess up the stats command. I would consider this a bug or maybe someone else can explain.

1
Ethan On

This is a weird one. I understand what is happening but I don't have an easy fix or work-around other than to use a different name scheme for your variables.

The problem is that when gnuplot sees the command

`stats $Data u (<anything>) name "M"`

it knows that the stats command will create a bunch of variables named M_*, in this case M_records = 4 M_invalid = 0 M_headers = 0 M_blank = 0 M_blocks = 1 M_outofrange = 0 M_columns = 2 M_mean = 2.5 and so on. In preparation for this it deletes all existing variables that match the pattern M_*. Unfortunately that includes the previously declared array M_x_N.

If I recall correctly, the rationale for deleting potentially conflicting variables was that failure of the stats command is considered a non-fatal error. Some scripts used the existence of one of these variables to test whether a previous stats command had succeeded or not. The idea was that stats would delete them on entry, then create (or possibly re-create) them after successful execution. So if STATS_records (or in this case M_records) did not exist or was undefined after executing the stats command then the command must have failed. Furthermore, if a script blindly uses one of these variables to make a plot or further calculations then delete-on-failure prevents generating an incorrect plot or calculation using leftover values from an earlier successful stats command on a different data source.

One could complain that deleting all variables matching the name pattern is over-kill; why doesn't the program limit itself to deleting just the admittedly long list of variable names it will create? Fair enough. But even so you would hit the same problem if instead of creating an array named M_x_N you created one named M_sumxy or any of the 30+ other names that the stats command can over-write.