Reading raw image files inside the map method of tensorflow using Rawpy

Question

Reading raw image files inside the map method of tensorflow using Rawpy

294 views Asked by rohan843 At 01 January 2025 at 09:25

I am trying to load various raw (AWR) images inside a TF dataset for training a model. Basically, I initially had 2 lists:

im1: This has the image file paths that will be input to the model.

im2: This is the expected output.

I am creating the dataset as follows:

ds_train = tf.data.Dataset.from_tensor_slices((im1, im2))

Now, this dataset would contain all the paths. To load the raw images from the files, I am using a mapping function as follows:

def read_image(im1, im2):
    im1 = rawpy.imread(im1).raw_image_visible.astype(np.float32)
    im2 = rawpy.imread(im2).raw_image_visible.astype(np.float32)
    return im1, im2

ds_train = ds_train.map(read_image)

This is giving me an error that seems to be associated with the rawpy module:

AttributeError                            Traceback (most recent call last)
...
AttributeError: in user code:

    File "/tmp/ipykernel_24/2296991765.py", line 6, in read_image  *
        short = rawpy.imread(short).raw_image_visible.astype(np.float32)
    File "/opt/conda/lib/python3.7/site-packages/rawpy/__init__.py", line 20, in imread  *
        d.open_file(pathOrFile)
    File "rawpy/_rawpy.pyx", line 408, in rawpy._rawpy.RawPy.open_file  **
        

    AttributeError: 'Tensor' object has no attribute 'encode'

When I try to extract the string value of the path from im1 and im2 using the .numpy() method, I get a new error that seems to suggest that the .numpy() method doesn't exist:

AttributeError                            Traceback (most recent call last)
...
AttributeError: in user code:

    File "/tmp/ipykernel_24/2505255456.py", line 6, in read_image  *
        short = rawpy.imread(short.numpy()).raw_image_visible.astype(np.float32)

    AttributeError: 'Tensor' object has no attribute 'numpy'

The modification I did to my code was:

def read_image(im1, im2):
    im1= rawpy.imread(im1.numpy()).raw_image_visible.astype(np.float32)  # numpy method added
    im2= rawpy.imread(im2.numpy()).raw_image_visible.astype(np.float32)  # numpy method added
    return short, long

ds_train = ds_train.map(read_image)

The full code may be seen in this notebook: https://www.kaggle.com/code/rohan843/learning-to-see-in-the-dark-tf2/notebook

Note: In the notebook above, I have used short and long instead of im1 and im2. The error causing part is currently commented out.

Original Q&A

There are 2 answers

**Frightera** · Answer 1 · 2023-04-05T21:21:38+00:00

You can't just use arbitrary functions/modules in a tf.data map method as it runs in Graph mode. For example:

AttributeError: 'Tensor' object has no attribute 'numpy'

is related running in Graph mode. You can use tf.py_function but this can cause slowdowns.

def read_image(short, long):
    
    short_path = short.numpy().decode('utf-8')
    long_path = long.numpy().decode('utf-8')

    with rawpy.imread(short_path) as raw:
        short_array = raw.raw_image_visible.astype(np.float32)
    with rawpy.imread(long_path) as raw:
        long_array = raw.raw_image_visible.astype(np.float32)
    return short_array, long_array

def read_image_wrapper(short, long):
    short_array, long_array = tf.py_function(
        func=read_image, inp=[short, long], Tout=(tf.float32, tf.float32)
    )
    
    # Set shapes manually because TF can not infer this info from a py_function output.
    short_array.set_shape([short_array.shape[0], short_array.shape[1]])
    long_array.set_shape([long_array.shape[0], long_array.shape[1]])

    return short_array, long_array

ds_train = ds_train.map(read_image_wrapper)
ds_val = ds_val.map(read_image_wrapper)
ds_test = ds_test.map(read_image_wrapper)

When you use tf.py_function you can apply operations on a tf.data pipeline as if it is working in Eager mode.

**rohan843** · Answer 2 · 2023-04-06T19:04:00+00:00

I looked at a few possible approaches, and all seemed to be working fine. Apart from the already presented solutions, I would also like to add this one:

def generator_func_train():
    im1_paths, im2_paths= list_of_im1_paths, list_of_im2_paths
    for im1, im2 in zip(im1_paths, im2_paths):
        yield loading_function(im1), loading_function(im2)

ds_train = tf.data.Dataset.from_generator(
    generator_func_train, 
    output_signature=(
         tf.TensorSpec(shape=(None, None, 4), dtype=np.float32),   # The dimensions of the first image
         tf.TensorSpec(shape=(None, None, 3), dtype=np.float32)   # The dimensions of the second image
    )
)

This uses the generator function to load the dataset. The official docs can be found here.

TechQA.

Reading raw image files inside the map method of tensorflow using Rawpy

There are 2 answers

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in TENSORFLOW-DATASETS

Related Questions in RAWPY

Popular Questions

Popular Tags

Trending Questions