Kubeflow returns a no such file or directory on container start

960 views Asked by At

I'm currently trying to deploy a pipeline on Kubeflow, but everytime I start it, it returns:

This step is in Failed state with this message: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"python /usr/src/app/FeatureExtractor.py\": stat python /usr/src/app/FeatureExtractor.py: no such file or directory": unknown

This is my pipeline: it currently fails on all fe-* components, which are required to run the other ones.

Pipeline

The Dockerfile of the image of said components is:

FROM python:2
COPY FeatureExtractor.py /usr/src/app/
COPY FE_freeze.pb /usr/src/app/
COPY DB /usr/src/app/
RUN pip install opencv-python==4.2.0.32
RUN pip install imutils
RUN pip install image
RUN pip install tensorflow==1.15

while the pipeline is created through this python function:

import kfp
from kfp import dsl

def feature_extractor(name, images, result):
    images = "--path_imgs={}".format(images)
    result = "--res_name={}".format(result)

    return dsl.ContainerOp(
        name=name,
        image='texdade/feature-extractor',
        command=['python /usr/src/app/FeatureExtractor.py'],
        arguments=['--pretrained_model="/usr/src/app/FE_freeze.pb"', images, result],
        file_outputs={
            'feature_vector':result,
        }
    )

def train_mv(name, primary, secondary, non_members, result):
    primary = "--members={}".format(primary)
    secondary = "--other_members={}".format(secondary)
    non_members = "--non_members={}".format(non_members)

    return dsl.ContainerOp(
        name=name,
        image='texdade/train-mv',
        command=['python /usr/src/app/Train-MV.py'],
        arguments=[primary, secondary, non_members, result],
        file_outputs={
            'model':result,
        }
    )

def test_mv(model_a, model_b, test_imgs):
    model_a = "--MV_A={}".format(model_a)
    model_b = "--MV_B={}".format(model_b)
    test_imgs = "--test_imgs={}".format(test_imgs)

    return dsl.ContainerOp(
        name="Test models",
        image="texdade/test-mv",
        command=['python /usr/src/app/Test-MV.py'],
        arguments=[model_a, model_b, test_imgs]
    )

@dsl.pipeline(
    name='First pipeline',
    description='FP'
)
def first_pipeline():
    FE_A = feature_extractor('FE members A', "/usr/src/app/DB/A/", "/usr/src/app/A.npz")
    FE_B = feature_extractor('FE members B', "/usr/src/app/DB/B/", "/usr/src/app/B.npz")
    FE_N = feature_extractor('FE Non members', "/usr/src/app/DB/N/", "/usr/src/app/N.npz")
    FE_Test = feature_extractor('FE Test dataset', "/usr/src/app/DB/Test", "/usr/src/app/Test.npz")
    train_a = train_mv("Train members A", FE_A.output, FE_B.output, FE_N.output, "/usr/src/app/A.pb")
    train_b = train_mv("Train members B", FE_B.output, FE_A.output, FE_N.output, "/usr/src/app/B.pb")
    test = test_mv(train_a.output, train_b.output, FE_Test.output)

if __name__ == '__main__':
    kfp.compiler.Compiler().compile(first_pipeline, __file__+ '.yaml')

The problem seems to the that FeatureExtractor.py can't be found on the container, which seems odd since launching the container manually (without Kubeflow) makes it execute.

Do you have any ideas on how to fix this? Thanks in advance! :)

0

There are 0 answers