The piece of code below filters an IEnumerable<T> with another, used as a blacklist. The filtered collection iterates over content fetched remotely (lazy loading, YouTube API).
IEnumerable<CustomType> contentThatCanBeHuge = this.FetchContentThatCanBeHuge();
IEnumerable<string> blackListContent = this.FetchBlackListContent();
return contentThatCanBeHuge.Where(x => !blackListContent.Contains(x.Id));
The Enumerable.Contains method is O(n) in time complexity, so the Enumerable.Where call could take a while.
In the other hand, HashSet<T>.Contains is O(1). Instantiating a HashSet<T> from an IEnumerable<T> seems to be O(n).
If the blacklist is about to be used multiple times, and without taking space complexity into account, is it a good approach to turn it into a HashSet<T> before using it or is this just premature optimization?
Let size of
blackListContentbemand size ofcontentThatCanBeHugeisn.If we don't use
HashSet, time complexity isO(n * O(m))=O(n * m), space complexity isO(1): for each item incontentThatCanBeHugewe should scan entireblackListContent.If we use
HashSet, time complexity isO(m) + O(n * O(1))=O(n + m), space complexity isO(m):HashSet-O(m)time complexity,O(m)space complexity.contentThatCanBeHugewe should check it with rspect of HashSet -O(n * O(1))time complexity.So far so good
HashSetmakes the code faster but we consumes more memory.