Elasticsearch dsl - large unique list of single column in python

1.6k views Asked by At

I have a large Windows event log set that I am attempting to find unique listing of a users from a single column in a single event ID. This runs, but takes an extremely long time. How would you use python Elasticsearch_dsl and Elasticsearch-py to accomplish this?

    es = Elasticsearch([localhostmines], timeout=30)
    s = Search(using=es, index="logindex-*").filter('term', EventID="4624")

    users = set([])
    for hit in s.scan():
        users.add(hit.TargetUserName)

    print(users)

TargetUserName column contains stringed names, EventID column contains strings of event ids for windows.

1

There are 1 answers

10
Val On BEST ANSWER

You need to use a terms aggregations which will do exactly what you expect.

s = Search(using=es, index="logindex-*").filter('term', EventID="4624")
s.aggs.bucket('per_user', 'terms', field='TargetUserName')

response = s.execute()
for user in response.aggregations.per_user.buckets:
    print(user.key, user.doc_count)