Is there a mechanism to apply a standard set of checks to detect and then transform a String to the detected type, using one of Jackson's standard text related libs (csv, json, or even jackson-core)? I can imagine using it along with a label associated with that value (CSV header, for example) to do something sorta like the following:
JavaTypeAndValue typeAndValue = StringToJavaType.fromValue(Object x, String label);
typeAndValue.type() // FQN of Java type, maybe
typeAndValue.label() // where label might be a column header value, for example
typeAndValue.value() // returns Object of typeAndValue.type()
A set of 'extractors' would be required to apply the transform, and the consumer of the class would have to be aware of the 'ambiguity' of the 'Object' return type, but still capable of consuming and using the information, given its purpose.
The example I'm currently thinking about involves constructing SQL DDL or DML, like a CREATE Table statement using the information from a List derived from evaluating a row from a csv file.
After more digging, hoping to find something out there, I wrote the start of what I had in mind.
Please keep in mind that my intention here isn't to present something 'complete', as I'm sure there are several things missing here, edge cases not addressed, etc.
The pasrse(List<Map<String, String>> rows, List<String> headers
comes from the idea that this could be a sample of rows from a CSV file read in from Jackson, for example.
Again, this isn't complete, so I'm not looking to pick at everything that's wrong with the following. The question isn't 'how would we write this?', it's 'is anyone familiar with something that exists that does something like the following?'.
import gms.labs.cassandra.sandbox.extractors.Extractor;
import gms.labs.cassandra.sandbox.extractors.Extractors;
import lombok.Builder;
import lombok.Getter;
import lombok.Setter;
import lombok.experimental.Accessors;
@Accessors(fluent=true, chain=true)
public class TypeAndValue
{
@Builder
TypeAndValue(Class<?> type, String rawValue){
this.type = type;
this.rawValue = rawValue;
label = "NONE";
}
@Getter
final Class<?> type;
@Getter
final String rawValue;
@Setter
@Getter
String label;
public Object value(){
return Extractors.extractorFor(this).value(rawValue);
}
static final String DEFAULT_LABEL = "NONE";
}
A simple parser, where the parse
came from a context where I have a List<Map<String,String>>
from a CSVReader.
import org.apache.commons.lang3.ObjectUtils;
import org.apache.commons.lang3.math.NumberUtils;
import java.util.*;
import java.util.function.BiFunction;
public class JavaTypeParser
{
public static final List<TypeAndValue> parse(List<Map<String, String>> rows, List<String> headers)
{
List<TypeAndValue> typesAndVals = new ArrayList<TypeAndValue>();
for (Map<String, String> row : rows) {
for (String header : headers) {
String val = row.get(header);
TypeAndValue typeAndValue =
// isNull, isBoolean, isNumber
isNull(val).orElse(isBoolean(val).orElse(isNumber(val).orElse(_typeAndValue.apply(String.class, val).get())));
typesAndVals.add(typeAndValue.label(header));
}
}
}
public static Optional<TypeAndValue> isNumber(String val)
{
if (!NumberUtils.isCreatable(val)) {
return Optional.empty();
} else {
return _typeAndValue.apply(NumberUtils.createNumber(val).getClass(), val);
}
}
public static Optional<TypeAndValue> isBoolean(String val)
{
boolean bool = (val.equalsIgnoreCase("true") || val.equalsIgnoreCase("false"));
if (bool) {
return _typeAndValue.apply(Boolean.class, val);
} else {
return Optional.empty();
}
}
public static Optional<TypeAndValue> isNull(String val){
if(Objects.isNull(val) || val.equals("null")){
return _typeAndValue.apply(ObjectUtils.Null.class,val);
}
else{
return Optional.empty();
}
}
static final BiFunction<Class<?>, String, Optional<TypeAndValue>> _typeAndValue = (type, value) -> Optional.of(
TypeAndValue.builder().type(type).rawValue(value).build());
}
Extractors. Just an example of how the 'extractors' for the values (contained in strings) might be registered somewhere for lookup. They could be referenced any number of other ways, too.
import gms.labs.cassandra.sandbox.TypeAndValue;
import org.apache.commons.lang3.ObjectUtils;
import org.apache.commons.lang3.math.NumberUtils;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.Arrays;
import java.util.List;
public class Extractors
{
private static final List<Class> NUMS = Arrays.asList(
BigInteger.class,
BigDecimal.class,
Long.class,
Integer.class,
Double.class,
Float.class);
public static final Extractor<?> extractorFor(TypeAndValue typeAndValue)
{
if (NUMS.contains(typeAndValue.type())) {
return (Extractor<Number>) value -> NumberUtils.createNumber(value);
} else if(typeAndValue.type().equals(Boolean.class)) {
return (Extractor<Boolean>) value -> Boolean.valueOf(value);
} else if(typeAndValue.type().equals(ObjectUtils.Null.class)) {
return (Extractor<ObjectUtils.Null>) value -> null; // should we just return the raw value. some frameworks coerce to null.
} else if(typeAndValue.type().equals(String.class)) {
return (Extractor<String>) value -> typeAndValue.rawValue(); // just return the raw value. some frameworks coerce to null.
}
else{
throw new RuntimeException("unsupported");
}
}
}
I ran this from within the JavaTypeParser class, for reference.
public static void main(String[] args)
{
Optional<TypeAndValue> num = isNumber("-1230980980980980980980980980980988009808989080989809890808098292");
num.ifPresent(typeAndVal -> {
System.out.println(typeAndVal.value());
System.out.println(typeAndVal.value().getClass()); // BigInteger
});
num = isNumber("-123098098097987");
num.ifPresent(typeAndVal -> {
System.out.println(typeAndVal.value());
System.out.println(typeAndVal.value().getClass()); // Long
});
num = isNumber("-123098.098097987"); // Double
num.ifPresent(typeAndVal -> {
System.out.println(typeAndVal.value());
System.out.println(typeAndVal.value().getClass());
});
num = isNumber("-123009809890898.0980979098098908080987"); // BigDecimal
num.ifPresent(typeAndVal -> {
System.out.println(typeAndVal.value());
System.out.println(typeAndVal.value().getClass());
});
Optional<TypeAndValue> bool = isBoolean("FaLse");
bool.ifPresent(typeAndVal -> {
System.out.println(typeAndVal.value());
System.out.println(typeAndVal.value().getClass()); // Boolean
});
Optional<TypeAndValue> nulll = isNull("null");
nulll.ifPresent(typeAndVal -> {
System.out.println(typeAndVal.value());
//System.out.println(typeAndVal.value().getClass()); would throw null pointer exception
System.out.println(typeAndVal.type()); // ObjectUtils.Null (from apache commons lang3)
});
}
I don't know of any library to do this, and never seen anything working in this way on an open set of possible types.
For closed set of types (you know all the possible output types) the easier way would be to have the class FQN written in the string (from your description I didn't get if you are in control of the written string).
The complete FQN, or an alias to it.
Otherwise I think there is no escape to not write all the checks.
Furthermore it will be very delicate as I'm thinking of edge use case.
Suppose you use json as serialization format in the string, how would you differentiate between a
String
value likeHello World
and aDate
written in some ISO format (eg.2020-09-22
). To do it you would need to introduce some priority in the checks you do (first try to check if it is a date using some regex, if not go with the next and the simple string one be the last one)What if you have two objects:
And you receive a serialization value of the second type, but with a null salary (null or the property missing completely).
How can you tell the difference between a set or a list?
I don't know if what you intended is so dynamic, or you already know all the possible deserializable types, maybe some more details in the question can help.
UPDATE
Just saw the code, now it seems more clear. If you know all the possible output, that is the way.
The only changes I would do, would be to ease the increase of types you want to manage abstracting the extraction process.
To do this I think a small change should be done, like:
Then you can define an extractor per type:
And then register and automatize the checks:
To parse CSV I would suggest https://commons.apache.org/proper/commons-csv as CSV parsing can incur in nasty issues.