I have 400 files, each one contains about 500000 character, and those 500000 characters consists only from about 20 letters. I want to make a histogram indicating the most 10 letters used (x-axis) and number of times each letter is used (y-axis). I wrote this code which has missing thing which is I want to know each bar is corresponding to which letter. What should I add on the code ? You can change the whole code, but keeping this is better for me. provide me the whole code so I can copy it directly to a script and run it.
i = 1;
z = zeros(1, 10);
for i=1:400
j = num2str(i);
file_name = strcat('part',j,'txt');
file_id = fopen(file_name);
part = fread(file_id, inf, 'uchar');
h = hist(part,10);
z = z + h;
fclose(file_id);
end
First of all, your use of
hist
is wrong.hist(data,10)
will create a histogram from data that consists of 10 bins, so a bin will correspond to more than one character in your files.A way to solve this would be to use
hist
on predefined bins like:Note that you have to define your bins to accommodate all possible values, therefore ranging from 1 to 255