I have a large csv file with users data. I have an endpoint that gets a user and should return a boolean indicating whether the user exists in the file or not.
To avoid running out of memory I read the file in streams and use getLines which is lazy.
This is my code:
def readUsers(filePath: Path): IO[Seq[User]] = {
Stream
.eval(IO(scala.io.Source.fromFile(filePath.toFile)))
.flatMap(source => Stream.fromIterator[IO](source.getLines(), 64))
.map(line => line.split(",")
.map(cols => User(cols(0), cols(1), cols(2), cols(3)))
.compile
.toList
}
def isUserExists(user: User, users: IO[Seq[User]]): IO[Boolean] =
users.map(_.contains(user))
I assign users a value when the app starts:
val users = readUsers(filePath)
and send it to the isUserExists function whenever the endpoint is called.
My questions are:
- Does the
toListkeep the entire file to the memory? - If so, how to avoid it? Should I remove the
toListand iterate over each line inisUserExists? - When the content of the file changes, will it be reflected in the endpoint since the
usersis an IO value?
fs2.iowill manage resource properly