I am trying to write data to .txt files. Each of the files is around 170MB (after writing data to it).
I am using octave's fprintf function, with '%.8f' to write floating point values to a file. However, I am noticing a very weird error, in that a sub-set of entries in some of the files are getting corrupted. For example, one of the lines in a file is this:
0.43529412,0.}4313725,0.43137255,0.33233533,...
that "}" should have been "4". Now how did octave's fprintf write that "}" with '%.8f' option in the first place? What is going wrong?
Another example is,
0.73289\8B987,...
how did that "\8B" get there?
I have to process a very large data-set with 360 Million points in total. This error in a sub-set of rows in some files is becoming a big problem. What is causing this problem?
Also, this corruption doesn't occur at random. For example, if a file has 1.1 Million rows, where each row corresponds to a vector representing a data-instance, then the problem occurs say in 100 rows at max, and these 100 rows are clustered togeter. Say for example, these are distributed from row 8000 to 8150, but it is not the case that out of 100 corrupted rows, first 50 are located near say 10000th row and the remaining at say 20000th row. They always form a cluster.
Note: Below code is the code-block responsible for extracting data and writing it to files. Some variables in the code, like K_Cell have been computed computed earlier and play virtually no role in data-writing process.
mf = fspecial('gaussian',[5 5], 2);
fidM = fopen('14_01_2016_Go_AeossRight_ClustersM_wLAMRD.txt','w');
fidC = fopen('14_01_2016_Go_AeossRight_ClustersC_wLAMRD.txt','w');
fidW = fopen('14_01_2016_Go_AeossRight_ClustersW_wLAMRD.txt','w');
kIdx = 1;
featMat = [];
% - Generate file names to print the data to
featNo = 0;
fileNo = 1;
filePath = 'wLRD10_Data_Road/featMat_';
fileName = [filePath num2str(fileNo) '.txt'];
fidFeat = fopen(fileName, 'w');
% - Compute the global means and standard deviations
gMean = zeros(1,13); % - Global mean
gStds = zeros(1,13); % - Global variance
gNpts = 0; % - Total number of data points
fidStat = fopen('wLRD10_Data_Road/featStat.txt','w');
for i=1600:10:10000
if (featNo > 1000000)
% - If more than 1m points, close the file and open new one
fclose(fidFeat);
% - Get the new file name
fileNo = fileNo + 1;
fileName = [filePath num2str(fileNo) '.txt'];
fidFeat = fopen(fileName, 'w');
featNo = 0;
end
imgName = [fAddr num2str(i-1) '.jpg'];
img = imread(imgName);
Ir = im2double(img(:,:,1));
Ig = im2double(img(:,:,2));
Ib = im2double(img(:,:,3));
imgR = filter2(mf, Ir);
imgG = filter2(mf, Ig);
imgB = filter2(mf, Ib);
I = im2double(img);
I(:,:,1) = imgR;
I(:,:,2) = imgG;
I(:,:,3) = imgB;
I = im2uint8(I);
[Feat1, Feat2] = funcFeatures1(I);
[Feat3, Feat4] = funcFeatures2(I);
[Feat5, Feat6, Feat7] = funcFeatures3(I);
[Feat8, Feat9, Feat10] = funcFeatures4(I);
ids = K_Cell{kIdx};
pixVec = zeros(length(ids),13); % - Get the local image features
for s = 1:length(ids) % - Extract features
pixVec(s,:) = [Ir(ids(s,1),ids(s,2)) Ig(ids(s,1),ids(s,2)) Ib(ids(s,1),ids(s,2)) Feat1(ids(s,1),ids(s,2)) Feat2(ids(s,1),ids(s,2)) Feat3(ids(s,1),ids(s,2)) Feat4(ids(s,1),ids(s,2)) ...
Feat5(ids(s,1),ids(s,2)) Feat6(ids(s,1),ids(s,2)) Feat7(ids(s,1),ids(s,2)) Feat8(ids(s,1),ids(s,2))/100 Feat9(ids(s,1),ids(s,2))/500 Feat10(ids(s,1),ids(s,2))/200];
end
kIdx = kIdx + 1;
for s=1:length(ids)
featNo = featNo + 1;
fprintf(fidFeat,'%d,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f\n', featNo, pixVec(s,:));
end
% - Compute the mean and variances
for s = 1:length(ids)
gNpts = gNpts + 1;
delta = pixVec(s,:) - gMean;
gMean = gMean + delta./gNpts;
gStds = gStds*(gNpts-1)/gNpts + delta.*(pixVec(s,:) - gMean)/gNpts;
end
end
Note that the code block:
for s=1:length(ids)
featNo = featNo + 1;
fprintf(fidFeat,'%d,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f\n', featNo, pixVec(s,:));
end
is the only part of the code that writes the data-points to the files.
The earlier code-block,
if (featNo > 1000000)
% - If more than 1m points, close the file and open new one
fclose(fidFeat);
% - Get the new file name
fileNo = fileNo + 1;
fileName = [filePath num2str(fileNo) '.txt'];
fidFeat = fopen(fileName, 'w');
featNo = 0;
end
opens a new file for writing the data to it, when the currently opened file exceeds the limit of 1 million data-points.
Furthermore, note that
pixVec
variable cannot contain anything other than floats/double values, or the octave will throw an error.