I am working with Pathology images - Whole Slide Images. What I am doing is read an SVS image using Openslide-Python, extract the tiles at the highest level with DeepZoomGenerator, and then use these tiles to construct a DeepZoom file again using the LibVips library.
With large images (40,000 x 30,000), it is taking longer (close to an hour) to complete extracting the tiles, and re-constructing them into a DZ file.
Here is the function that is extracting the tiles from the SVS image, and reconstructing it again with LibVips into an TIFF-DZ image.
def read_svs_and_create_dzi(svs_path, output_dzi_path, level, tile_size):
slide = openslide.open_slide(svs_path)
dz_generator = DeepZoomGenerator(slide, tile_size=tile_size, overlap=0, limit_bounds=False)
slide_width, slide_height = dz_generator.level_dimensions[level]
dzi_image = pyvips.Image.black(slide_width, slide_height, bands=3)
for tile_y in range(dz_generator.level_tiles[level][1]):
for tile_x in range(dz_generator.level_tiles[level][0]):
tile = dz_generator.get_tile(level, (tile_x, tile_y))
tile_vips = pyvips.Image.new_from_memory(tile.tobytes(), tile.width, tile.height, 3, "uchar")
x_loc = tile_x * tile_size
y_loc = tile_y * tile_size
dzi_image = dzi_image.insert(tile_vips, x_loc, y_loc)
print("Extracted tile", tile_x, "and", tile_y)
output_tiff_path = os.path.join(output_dzi_path, "dzi_image.tiff")
dzi_image.tiffsave(output_tiff_path, tile=True, pyramid=True)
dzi_dir = os.path.join(output_dzi_path, "dzi_files")
os.makedirs(dzi_dir, exist_ok=True)
dzi_image.dzsave(dzi_dir, layout="dz")
The approach I am taking is read a tile at a certain level and location, and use this tile to construct a libvip image and append this image to a blank libvip image that I had created before at the exact location as it was in the SVS image. I will later use this complete image to create a tiff using the libvip.tiffsave() and then later create a DeepZoom File.
In the for-loop when extracting the tiles, I notice that the time complexity to extract the images is quadrantic. Initially, the more tiles are extracted within a short time until we reach the 10th row, and then takes a stall in the time it takes. (Ps. 183 x 95 - col x row)
Are there better approaches to avoid this memory issue?