I have a largish Census/BLS data set with roughly ten million person records and about 250 variables per record, including missing values. Many of these values are dollar amounts. I suspect, but do not know, that the rightmost digit is disproportionately a zero. I want to count the number of consecutive zeros starting from the right and excluding anything after the decimal point. I want to produce a table showing the frequency with which different numbers of zeros occur, and another which shows the frequency with which different digits occur as the rightmost nonzero digit.
So if my numbers were
13,568,700
449,000
43,560
20,010
34,600
32,620
The tables I want would be For zeros:
3. 1
2. 2
1. 3
And for rightmost nonzero digit:
digit. count
1. 1
2. 1
6. 2
7. 1
9. 1
I have a function that does this for a single number and increments some counters appropriately, but it is not at all vectorized and it is unacceptably slow. Because there are many variables that might have been rounded for each person, if I just run the function once on each number, I need to run it on the order of 500,000,000 times. If I wanted the leftmost digit it would be easy to vectorize, but I have not been able to work out a vectorized algorithm for either the number of right-hand zeros or the rightmost nonzero digit.
I'd be grateful for help from some person smarter, or at least more knowledgeable, than I am.
Starting from something not-so-fast as a proof of concept:
This takes about 90 seconds on my machine, and you need 10x as much, and multiple columns, so there's loads of room for improvement, but in a pinch maybe getting to borderline workable.
Result
Then it's straightforward and fast to run these: