How to generate real world text data for Kotlin Tests?

97 views Asked by At

I'm using Kotest data generators for tests which are pretty flexible and allow to do almost everything. However, the String generators are very technical and it's tough to generate real world text String with them.

For example, generating Strings with printable ASCII characters ( to ~) is pretty far real world use cases even from real world ASCII input since there're no newlines and tabs included. In the real real world all sorts for UTF-8 characters can be created in browsers with various language settings.

There's the stringPattern generator in Kotest, but it uses RxGen 1.4 which does not yet support generation based on Character classes (release 1.5 is pending). Otherwise I'd say [\p{Punct}]|[\p{Graph}]|[\p{Print}]|[\p{Blank}] is my idea, but I have no idea about Unicode character classes and I feel that an existing solution to the problem is way better than figuring this out myself.

I'm using Kotest 5.8.0 in a Kotlin 1.9 project.

2

There are 2 answers

0
johanneslink On

Another option is to use jqwik's String generators. They can be used outside of jqwik property methods.

Here's an example:

import io.kotest.core.spec.style.FunSpec
import net.jqwik.api.Arbitraries

class KotlinTests : FunSpec({

    test("my first test") {
        Arbitraries.strings().ofLength(10).sampleStream()
            .limit(10)
            .forEach { println(it) }
    }

})

How you use jqwik's generators aka arbitraries directly is documented in this section of the user guide.

Full disclosure: I'm the main committer of jqwik

0
AndrewL On

If Lorem Ipsum can pass as real world text for you, this is easy to use:

val lorem: Lorem = LoremIpsum(seed)

// generates between 2 and 4 paragraphs:
val text = lorem.getParagraphs(2, 4)
        <dependency>
            <groupId>com.thedeanda</groupId>
            <artifactId>lorem</artifactId>
        </dependency>

Source: https://github.com/mdeanda/lorem