Is there a mechanism to apply a standard set of checks to detect and then transform a String to the detected type, using one of Jackson's standard text related libs (csv, json, or even jackson-core)? I can imagine using it along with a label associated with that value (CSV header, for example) to do something sorta like the following:

JavaTypeAndValue typeAndValue = StringToJavaType.fromValue(Object x, String label);  
typeAndValue.type() // FQN of Java type, maybe
typeAndValue.label() // where label might be a column header value, for example
typeAndValue.value() // returns Object  of typeAndValue.type()

A set of 'extractors' would be required to apply the transform, and the consumer of the class would have to be aware of the 'ambiguity' of the 'Object' return type, but still capable of consuming and using the information, given its purpose.

The example I'm currently thinking about involves constructing SQL DDL or DML, like a CREATE Table statement using the information from a List derived from evaluating a row from a csv file.

After more digging, hoping to find something out there, I wrote the start of what I had in mind.

Please keep in mind that my intention here isn't to present something 'complete', as I'm sure there are several things missing here, edge cases not addressed, etc.

The pasrse(List<Map<String, String>> rows, List<String> headers comes from the idea that this could be a sample of rows from a CSV file read in from Jackson, for example.

Again, this isn't complete, so I'm not looking to pick at everything that's wrong with the following. The question isn't 'how would we write this?', it's 'is anyone familiar with something that exists that does something like the following?'.

import gms.labs.cassandra.sandbox.extractors.Extractor;
import gms.labs.cassandra.sandbox.extractors.Extractors;
import lombok.Builder;
import lombok.Getter;
import lombok.Setter;
import lombok.experimental.Accessors;

@Accessors(fluent=true, chain=true)
public class TypeAndValue
{

    @Builder
    TypeAndValue(Class<?> type, String rawValue){
        this.type = type;
        this.rawValue = rawValue;
        label = "NONE";
    }

    @Getter
    final Class<?> type;

    @Getter
    final String rawValue;

    @Setter
    @Getter
    String label;

    public Object value(){
        return Extractors.extractorFor(this).value(rawValue);
    }

    static final String DEFAULT_LABEL = "NONE";

}

A simple parser, where the parse came from a context where I have a List<Map<String,String>> from a CSVReader.

import org.apache.commons.lang3.ObjectUtils;
import org.apache.commons.lang3.math.NumberUtils;

import java.util.*;
import java.util.function.BiFunction;

public class JavaTypeParser
{
public static final List<TypeAndValue> parse(List<Map<String, String>> rows, List<String> headers)
{
    List<TypeAndValue> typesAndVals = new ArrayList<TypeAndValue>();
    for (Map<String, String> row : rows) {
        for (String header : headers) {
            String val = row.get(header);
            TypeAndValue typeAndValue =
                    //  isNull, isBoolean, isNumber
                    isNull(val).orElse(isBoolean(val).orElse(isNumber(val).orElse(_typeAndValue.apply(String.class, val).get())));
            typesAndVals.add(typeAndValue.label(header));
        }
    }
  
}

public static Optional<TypeAndValue> isNumber(String val)
{
    if (!NumberUtils.isCreatable(val)) {
        return Optional.empty();
    } else {
        return _typeAndValue.apply(NumberUtils.createNumber(val).getClass(), val);
    }
}

public static Optional<TypeAndValue> isBoolean(String val)
{
    boolean bool = (val.equalsIgnoreCase("true") || val.equalsIgnoreCase("false"));
    if (bool) {
        return _typeAndValue.apply(Boolean.class, val);
    } else {
        return Optional.empty();
    }
}

public static Optional<TypeAndValue> isNull(String val){
    if(Objects.isNull(val) || val.equals("null")){
        return _typeAndValue.apply(ObjectUtils.Null.class,val);
    }
    else{
        return Optional.empty();
    }
}

static final BiFunction<Class<?>, String, Optional<TypeAndValue>> _typeAndValue = (type, value) -> Optional.of(
        TypeAndValue.builder().type(type).rawValue(value).build());

}

Extractors. Just an example of how the 'extractors' for the values (contained in strings) might be registered somewhere for lookup. They could be referenced any number of other ways, too.

import gms.labs.cassandra.sandbox.TypeAndValue;
import org.apache.commons.lang3.ObjectUtils;
import org.apache.commons.lang3.math.NumberUtils;

import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Arrays;
import java.util.List;

public class Extractors
{

private static final List<Class> NUMS = Arrays.asList(
        BigInteger.class,
        BigDecimal.class,
        Long.class,
        Integer.class,
        Double.class,
        Float.class);

public static final Extractor<?> extractorFor(TypeAndValue typeAndValue)
{
    if (NUMS.contains(typeAndValue.type())) {
        return (Extractor<Number>) value -> NumberUtils.createNumber(value);
    } else if(typeAndValue.type().equals(Boolean.class)) {
        return  (Extractor<Boolean>) value -> Boolean.valueOf(value);
    } else if(typeAndValue.type().equals(ObjectUtils.Null.class)) {
        return  (Extractor<ObjectUtils.Null>) value -> null; // should we just return the raw value.  some frameworks coerce to null.
    } else if(typeAndValue.type().equals(String.class)) {
        return  (Extractor<String>) value -> typeAndValue.rawValue(); // just return the raw value.  some frameworks coerce to null.
    }
    else{
        throw new RuntimeException("unsupported");
    }
}
}

I ran this from within the JavaTypeParser class, for reference.

public static void main(String[] args)
{

    Optional<TypeAndValue> num = isNumber("-1230980980980980980980980980980988009808989080989809890808098292");
    num.ifPresent(typeAndVal -> {
        System.out.println(typeAndVal.value());
        System.out.println(typeAndVal.value().getClass());  // BigInteger
    });
    num = isNumber("-123098098097987");
    num.ifPresent(typeAndVal -> {
        System.out.println(typeAndVal.value());
        System.out.println(typeAndVal.value().getClass()); // Long
    });
    num = isNumber("-123098.098097987"); // Double
    num.ifPresent(typeAndVal -> {
        System.out.println(typeAndVal.value());
        System.out.println(typeAndVal.value().getClass());
    });
    num = isNumber("-123009809890898.0980979098098908080987"); // BigDecimal
    num.ifPresent(typeAndVal -> {
        System.out.println(typeAndVal.value());
        System.out.println(typeAndVal.value().getClass());
    });

    Optional<TypeAndValue> bool = isBoolean("FaLse");
    bool.ifPresent(typeAndVal -> {
        System.out.println(typeAndVal.value());
        System.out.println(typeAndVal.value().getClass()); // Boolean
    });

    Optional<TypeAndValue> nulll = isNull("null");
    nulll.ifPresent(typeAndVal -> {
        System.out.println(typeAndVal.value());
        //System.out.println(typeAndVal.value().getClass());  would throw null pointer exception
        System.out.println(typeAndVal.type()); // ObjectUtils.Null (from apache commons lang3)
    });

}
3

There are 3 answers

0
rascio On BEST ANSWER

I don't know of any library to do this, and never seen anything working in this way on an open set of possible types.

For closed set of types (you know all the possible output types) the easier way would be to have the class FQN written in the string (from your description I didn't get if you are in control of the written string).
The complete FQN, or an alias to it.

Otherwise I think there is no escape to not write all the checks.

Furthermore it will be very delicate as I'm thinking of edge use case.

Suppose you use json as serialization format in the string, how would you differentiate between a String value like Hello World and a Date written in some ISO format (eg. 2020-09-22). To do it you would need to introduce some priority in the checks you do (first try to check if it is a date using some regex, if not go with the next and the simple string one be the last one)

What if you have two objects:

   String name;
   String surname;
}

class Employee {
   String name;
   String surname;
   Integer salary
}

And you receive a serialization value of the second type, but with a null salary (null or the property missing completely).

How can you tell the difference between a set or a list?

I don't know if what you intended is so dynamic, or you already know all the possible deserializable types, maybe some more details in the question can help.

UPDATE

Just saw the code, now it seems more clear. If you know all the possible output, that is the way.
The only changes I would do, would be to ease the increase of types you want to manage abstracting the extraction process.
To do this I think a small change should be done, like:

interface Extractor {
    Boolean match(String value);
    Object extract(String value);
}

Then you can define an extractor per type:

class NumberExtractor implements Extractor<T> {
    public Boolean match(String val) {
        return NumberUtils.isCreatable(val);
    }
    public Object extract(String value) {
        return NumberUtils.createNumber(value);
    }
}
class StringExtractor implements Extractor {
    public Boolean match(String s) {
        return true; //<-- catch all
    }
    public Object extract(String value) {
        return value;
    }
}

And then register and automatize the checks:

public class JavaTypeParser {
  private static final List<Extractor> EXTRACTORS = List.of(
      new NullExtractor(),
      new BooleanExtractor(),
      new NumberExtractor(),
      new StringExtractor()
  )

  public static final List<TypeAndValue> parse(List<Map<String, String>> rows, List<String> headers) {
    List<TypeAndValue> typesAndVals = new ArrayList<TypeAndValue>();
    for (Map<String, String> row : rows) {
        for (String header : headers) {
            String val = row.get(header);
            
            typesAndVals.add(extract(header, val));
        }
    }
}
  public static final TypeAndValue extract(String header, String value) {
       for (Extractor<?> e : EXTRACTOR) {
           if (e.match(value) {
               Object v = extractor.extract(value);
               return TypeAndValue.builder()
                         .label(header)
                         .value(v) //<-- you can put the real value here, and remove the type field
                         .build()
           }
       }
       throw new IllegalStateException("Can't find an extractor for: " + header + " | " + value);

  }

To parse CSV I would suggest https://commons.apache.org/proper/commons-csv as CSV parsing can incur in nasty issues.

0
oat On

Try this :

import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

String j = // json string;

            JsonFactory jsonFactory = new JsonFactory();
            ObjectMapper jsonMapper = new ObjectMapper(jsonFactory);
            JsonNode jsonRootNode = jsonMapper.readTree(j);
            Iterator<Map.Entry<String,JsonNode>> jsonIterator = jsonRootNode.fields();

            while (jsonIterator.hasNext()) {
                Map.Entry<String,JsonNode> jsonField = jsonIterator.next();
                String k = jsonField.getKey();
                String v = jsonField.getValue().toString();
                ...

            }
0
Arvid Heise On

What you actually trying to do is to write a parser. You translate a fragment into a parse tree. The parse tree captures the type as well as the value. For hierarchical types like arrays and objects, each tree node contains child nodes.

One of the most commonly used parsers (albeit a bit overkill for your use case) is Antlr. Antlr brings out-of-the-box support for Json.

I recommend to take the time to ingest all the involved concepts. Even though it might seem overkill initially, it quickly pays off when you do any kind of extension. Changing a grammar is relatively easy; the generated code is quite complex. Additionally, all parser generator verify your grammars to show logic errors.

Of course, if you are limiting yourself to just parsing CSV or JSON (and not both at the same time), you should rather take the parser of an existing library. For example, jackson has ObjectMapper.readTree to get the parse tree. You could also use ObjectMapper.readValue(<fragment>, Object.class) to simply get the canonical java classes.