Encoding issues using tycho surefire plugin

100 views Asked by At

I've been turning my head around with this issue and cannot find an explanation for what is happening here. I'm using tycho surefire plugin to build a set of eclipse plugins and execute some unit tests. Here's the environment:

tycho-surefire-plugin: version 0.19.0 //very old I know, but I'm stuck with legacy code
maven 3.5.2
jdk 8
windows 10

I've started with a simple test case to test a method that replaces special characters with their simple version:

String input = "á:Á-é:É-í:Í-ó:Ó-ú:Ú-ñ:Ñ  ";
System.out.println("testing input: " + input);
Assert.assertEquals("a A e E i I o O u U n N ",
    Utils.sanitize(input, true));

the problem here is that when executing the junit directly on eclipse I get the expected result, so the test passes, but when I execute the tycho build I get:

testing input: ?:?-?:?-?:?-?:?-?:?-?:?
Failed tests:   testSanitizeWithSpaces(com.fja.eos.automation.UtilsTest): expected:<[a A e E i I o O u U n N] > but was:<[o o o o o o o o o o o o] >

The value for

System.out.println("Default charset: " + Charset.defaultCharset());

is the same in both scenarios:

Default charset: windows-1252

my next attempt was to read the input value from a file, controlling the charset using:

InputStream is = UtilsTest.class
    .getResourceAsStream("sanitationTestSubjects.xml");
InputSource source = new InputSource(is);
source.setEncoding("ISO-8859-1");
Document doc = builder.parse(is);

for the file

<?xml version="1.0" encoding="ISO-8859-1" ?>
<SanitationTestSubjects>
    <Subject input="á:Á-é:É-í:Í-ó:Ó-ú:Ú-ñ:Ñ  " expected="a A e E i I o O u U n N " />
</SanitationTestSubjects>

while reading the input like this I got a slightly different result:

testing input: á:?-é:É-í:?-ó:?-ú:?-ñ:Ñ

but still not correct. If I try to get the escaped input with

StringEscapeUtils.escapeJava(elem.getAttribute("input"))

I get what it seems to be the correct unicode sequence:

Escaped input: \u00E1:\u00C1-\u00E9:\u00C9-\u00ED:\u00CD-\u00F3:\u00D3-\u00FA:\u00DA-\u00F1:\u00D1

I've tried setting all character encoding options on the tycho-surefire-plugin without any change on behavior:

<build>
        <plugins>
            <plugin>
                <groupId>org.eclipse.tycho</groupId>
                <artifactId>tycho-surefire-plugin</artifactId>
                <version>${tycho-version}</version>
                <configuration>
                    <appArgLine>-Dfile.encoding=ISO-8859-1</appArgLine>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <version>3.2.0</version>
                <configuration>
                    <encoding>ISO-8859-1</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>

one more test, compiling the files on eclipse and with maven results in binary equal files..

UPDATE: the encoding of the java file itself was set to cp1252, after changing it to ISO-8859-1 I got the same result as reading the value from a file.. still not there..

I'm really feeling that I'm looking to the wrong side of the problem. can anyone please help?

0

There are 0 answers