Read Wordlist into Rapidminer Execute R

302 views Asked by At

I'm using Rapidminer, and after a Wordlist to Data operator, I want to create a word cloud with Execute R using the script below. I get the R execution failure: "no applicable method for 'TermDoumentMatrix' applied to an object of class \"c('data.table', data.frame')\".

The datatable has a column of words, a column of document occurrences, and a column of word occurrences.

Can anyone advise please how I resolve the error?

rm_main = function(data)
{
    wordcloud::wordcloud(data, scale=c(5,0.5), max.words=100, random.order=FALSE,
    rot.per=0.35, use.r.layout=FALSE, colors="Dark2")
}
1

There are 1 answers

2
lukeA On BEST ANSWER

Passing the whole data frame data as 1st argument to wordcloud does not work. (In fact it would work, if it was an object of type termDocumentMatrix, which is part of R's tm package, but that's another story; however that's what the error message is about.) From within RapidMiner you have to specify the words and their frequency as 1st and 2nd parameter respectively.

So you could use something like

rm_main = function(data)
{
    windows() # on MS Windows systems
    wordcloud::wordcloud(data$word, data$total, min.freq = 1)
    Sys.sleep(5)
}

Here's an example process, which creates a word cloud, saves it to "c:\mywordcloud.pdf" and opens it (on Windows) with the default application that is associated with .pdf files:

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="7.3.000" expanded="true" height="68" name="Create Document" width="90" x="179" y="187">
        <parameter key="text" value="Hello world world!"/>
        <parameter key="add label" value="false"/>
        <parameter key="label_type" value="nominal"/>
      </operator>
      <operator activated="true" class="text:process_documents" compatibility="7.3.000" expanded="true" height="103" name="Process Documents" width="90" x="313" y="238">
        <parameter key="create_word_vector" value="true"/>
        <parameter key="vector_creation" value="TF-IDF"/>
        <parameter key="add_meta_information" value="true"/>
        <parameter key="keep_text" value="false"/>
        <parameter key="prune_method" value="none"/>
        <parameter key="prune_below_percent" value="3.0"/>
        <parameter key="prune_above_percent" value="30.0"/>
        <parameter key="prune_below_rank" value="0.05"/>
        <parameter key="prune_above_rank" value="0.95"/>
        <parameter key="datamanagement" value="double_sparse_array"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.3.000" expanded="true" height="68" name="Tokenize" width="90" x="45" y="34">
            <parameter key="mode" value="non letters"/>
            <parameter key="characters" value=".:"/>
            <parameter key="language" value="English"/>
            <parameter key="max_token_length" value="3"/>
          </operator>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="text:wordlist_to_data" compatibility="7.3.000" expanded="true" height="82" name="WordList to Data" width="90" x="514" y="289"/>
      <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Execute R" width="90" x="715" y="289">
        <parameter key="script" value="rm_main = function(data)&#10;{&#10;    windows()&#10;    wordcloud::wordcloud(data$word, data$total, min.freq = 1)&#10;    Sys.sleep(3)&#10;}&#10;"/>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
      <connect from_op="Process Documents" from_port="word list" to_op="WordList to Data" to_port="word list"/>
      <connect from_op="WordList to Data" from_port="example set" to_op="Execute R" to_port="input 1"/>
      <connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>