Apache Beam pipeline stuck indefinitely using Wait.on transform with JdbcIO

258 views Asked by At

I have an Apache Beam pipeline that gets stuck indefinitely when using the Wait.on transform alongside JdbcIO. Here's a simplified version of my code, focusing on the relevant parts:

PCollection<String> result = p.
    apply("Pubsub", PubsubIO.readMessagesWithAttributes().fromSubscription(/*...*/))
    .apply("Transform", ParDo.of(new MyTransformer()));

PCollection<Void> insert = result.apply("Inserting",
    JdbcIO.<String>writeVoid()
        .withDataSourceProviderFn(/*...*/)
        .withStatement(/*...*/)
        .withPreparedStatementSetter(/*...*/)
);

result.apply(Wait.on(insert))
    .apply("Selecting", new SomeTransform())
    .apply("PubsubMessaging", ParDo.of(new NextTransformer()));
p.run();

In the code, I'm using the Wait.on transform to make the pipeline wait until the insert transform (which uses JdbcIO to write data) is completed before executing the next steps. However, the pipeline gets stuck and doesn't progress further.

I've tried adding logging messages in my transforms to track the progress and identify where it's getting stuck, but I haven't been able to pinpoint the issue. I've also looked through multiple Stack Overflow posts on this topic, but none of them provided a successful solution for my problem.

Can anyone provide any insights or suggestions on how to debug and resolve this issue involving Wait.on and JdbcIO in my Apache Beam pipeline?

sample code https://github.com/j1cs/app-beam

Update: according to this discussion https://lists.apache.org/thread/z5phzd01nln6jnr5qn20sojqlj3tqqhd appears the problem is the implementation of pubsub using direct-runner. i haven't tested it yet using dataflow as a backend.

0

There are 0 answers