I'm running a query to bifurcate splunk results into buckets. I want to divide and count files based on sizes they are taking on disk. This can be achieved using rangemap
or eval case
.
As I read here using eval
is faster than rangemap
. But I'm getting different results on using both.
This is the query I'm running -
<source>
| eval size_group = case(SizeInMB < 150, "0-150 MB", SizeInMB < 200 AND SizeInMB >= 150, "150-200 MB", SizeInMB < 300 AND SizeInMB >= 200, "200-300 MB", SizeInMB < 500 AND SizeInMB >= 300, "300-500 MB", SizeInMB < 1000 AND SizeInMB >= 500, "500-1000 MB", SizeInMB > 1000, ">1000 MB")
| stats count by size_group
and this is the result I'm getting -
Whereas using rangemap
this is the query -
<source>
| rangemap field=SizeInMB "0-150MB"=0-150 "151-200MB"=150-200 "201-300MB"=200-300 "301-500MB"=300-500 "501-999MB"=500-1000 default="1000MB+"
| stats count by range
I tried this range too - rangemap field=SizeInMB "0-150MB"=0-150 "150-200MB"=150-200 "200-300MB"=200-300 "300-500MB"=300-500 "500-1000MB"=500-1000 default="1000MB+"
and I get the same result -
There is not a huge difference in both the images results, and we can probably live with it - but I see for the range 150-200MB - it is 445958 vs 445961
, and for 200-300 MB it is 3676 vs 3677
and for 300-500 MB it is 3346 vs 3348
. I want to understand why is that difference, and which one should I trust more? Speedwise eval
seems better, but datawise is it not so correct?
The problem you're seeing is your
rangemap
has overlapping values.Whereas with the
eval
format, you're trimming the ranges "properly" withcase
.Sidebar - you can make that
case
simpler thusly:Since
case
expressions stop evaluating as soon as a match is made, no need to useAND
as you'd had it. And using0=0
for your last possibility will always evaluate true (think ofdefault
incase
statements in C or C++).