There is a scenario that if a query can fetch a large portion of table (like 50%), then optimizer would not choose index even if index is applicable, instead optimizer would use table scan. Someone said that for SSD, above optimization does not make sense anymore as SSD's random access is very fast (always use index scan should not have any problem) .
Is above right?
I feel that even if random access is fast, block read data page would be still faster comparing to first read index page, then read data page.
It doesn't make the optimization completely obsolete, but it does change the balance between when it's more efficient to use an index vs. a sequential scan. It makes sense to use an index for queries that return a much larger fraction of the table when the cost of random access is low, but there's still a point where the time spent reading the index itself outweighs the time saved by reading (slightly) fewer rows from the table.
Your database may provide a way to configure the optimizer to better understand the underlying storage system's performance characteristics. For example, PostgreSQL has settings for the estimated cost of various operations, which the query planner uses to estimate the overall cost of each possible query plan. By default, a random access is estimated to be 4 times as expensive as a sequential access, but you can reduce the value of
random_page_cost
to be similar or equal toseq_page_cost
if your database is on an SSD. This will reduce the estimated cost of query plans that involve lots of random access, so the planner will choose them more often.