Is there a way to completely swap out the way serialization is handled with Apache Beam?

223 views Asked by At

I'm using Kotlin with Apache Beam and I have a set of DTOs that reference each other and all serialize great for any encoder with Kotlinx Serialization. When I try to use them with Beam I end up having issues because it's looking for all objects, type parameters and nested objects to implement the Java Serializable interface. Problem is, I'm not in control of that with all object types because some come from 3rd-party libraries.

I've implemented my own CustomCoder<T> type that uses Kotlinx Serialization but then I run into issues with my custom coder not being serializable, particularly due to the Kotlinx Serialization plugin-generated Companion object serializer not serializing. Since it's compile-time generated code I don't really have control over that and I can't flag it as @Transient. I tried implementing Externalizable on the coder and it fails as soon as I pass a type argument for T that doesn't implement Serializable or has a nested type argument that doesn't.

Also, Kotlinx Serialization is nice because it doesn't use reflection. It would make a lot of my current headaches disappear if I could just swap out the serialization mechanism somehow and not have to rely on standard Java serialization methods at all or somehow implement Externalizable in a way that just calls out to my own serialization mechanism and ignores the type parameter. Are there any solutions? I don't care how hacky it is, even if the solution involves messing with stuff in the Gradle build config to override something. I'm just not sure how to go about it so any pointers would be a great help!

Alternatively, if I abandon Kotlinx Serialization, are there any simple solutions to make any arbitrarily complex data type serialization just work with Java, even using reflection, without a lot of custom, manual work to handle encoding and decoding? I feel like maybe I'm just missing something obvious. This is my first project with Apache Beam but so far the google is little help.

1

There are 1 answers

0
vipcxj On

Mybe late, I develop an annotation processor called beanknife recently, it support generate DTO from any class. You need config by annotation. But you don't need change the original class. This library support configuring on a separate class. Of course you can choose which property you want and which you not need. And you can add new property by the static method in the config class. The most power feature of this library is it support automatically convert a object property to the DTO version. for example

class Pojo1 {
    String a;
    Pojo b; // circular reference to Pojo2
}

class Pojo2 {
    Pojo1 a;
    List<Pojo1> b;
    Map<List<Pojo1>>[] c;
}

// remove the circular reference in the DTO
@ViewOf(value = Pojo1.class, includePattern = ".*", excludes={Pojo1Meta.b})
class ConfigureOfPojo2 {}

// use the no circular reference versioned dto replace the Pojo1
@ViewOf(value = Pojo2.class, includePattern = ".*")
class ConfigureOfPojo2 {
    // convert b to dto version
    @OverrideViewProperty(Pojo2Meta.b)
    private List<Pojo1View> b;
    // convert c to dto version
    @OverrideViewProperty(Pojo2Meta.c)
    private Map<List<Pojo1View>>[] c;
}

will generate

// meta class, you can use it to reference the property name in a safe way.
class Pojo1Meta {
    public final String a = "a";
    public final String b = "b";
}

// generated DTO class. The actual one will be more complicate, there are many other method.
class Pojo1View {
    private String a;
    public Pojo1View read(Pojo1 source) { ... }
    ... getters and setters ...
}

class Pojo2Meta {
    public final String a = "a";
    public final String b = "b";
    public final String c = "c";
}

class Pojo2View {
    private String a;
    private List<Pojo1View> b;
    private Map<List<Pojo1View>>[] c;
    public Pojo1View read(Pojo2 source) { ... }
    ... getters and setters ...
}

The interest things here is you can safely use the class not exist yet in the source. Although the compiler may complain, all will be ok after compiled. Because all the extra class will be automatically generated just before compiled. A better approach may be to compile step by step, first add @ViewOf annotations, and then compile, so that all the classes that need to be used later are generated. Compile again after the configuration is complete. The advantage of this is that the IDE will not have grammatical error prompts, and can make better use of the IDE's auto-complete function.

With the support of using generated DTO in the configure class. You can define a Dto without circular reference just like the example. Furthermore, you can define another dto for Pojo2, and remove all property reference the Pojo1 and use it to replace the property b in Pojo1.