I have 400 files, each one contains about 500000 character, and those 500000 characters consists only from about 20 letters. I want to make a histogram indicating the most 10 letters used (x-axis) and number of times each letter is used (y-axis). how can i make it.
matlab plot histogram indicating sum of each character inside a file
400 views Asked by syd26 AtThere are 2 answers
Note: This answers the original version of the question (the data consists of 10 letters only; a histogram is wanted). The question was completely changed (the data consists of about 20 letters, and a histogram of the 10 most used letters is wanted).
If the ten letters are arbitrary and not known in advance, you can't use hist(..., 10)
. Consider the following example with three arbitrary "letters":
h = hist([1 2 2 10], 3);
The result is not [1 2 1]
as you would expect. The problem is that hist
chooses equal-width bins.
Here are three approaches to do what you want:
You can find the letters with
unique
and then do the sum withbsxfun
:letters = unique(part(:)).'; %'// these are the letters in your file h = sum(bsxfun(@eq, part(:), letters)); %// count occurrences of each letter
The second line of the above approach could be replaced by
histc
specifying the bin edges:letters = unique(part(:)).'; h = histc(part, letters);
Or you could use
sparse
to do the accumulation:t = sparse(1, part, 1); [~, letters, h] = find(t);
As an example, for part = [1 2 2 10]
either of the above gives the expected result,
letters =
1 2 10
h =
1 2 1
Since you have an array of
uchar
, you know that your elements will always be in the range0:255
. After seeing Tamás Szabó's answer here I realized that the null character is exceedingly unlikely in a text file, so I will just ignore it and use the range1:255
. If you expect to have null characters, you'll have to adjust the range.In order to find the 10 most frequently-used letters, we'll first calculate the histogram counts, then sort them in descending order and take the first 10:
Now we need to rearrange the counts and indices to put the letters back in alphabetical order:
Now we can plot the histogram using
bar
:(You can add the
'hist'
option if you want the bars in the graph touching like they do in the normalhist
plot.)To change the horizontal legend from numeric values to characters, use
sortedChars
as the'XtickLabel'
: