Text summary in R for multiple rows

621 views Asked by At

I have a set of short text files that I was able to combine into one datatest so that each file is in a row.

I am trying to summarize the content using the LSAfun package using the generic function argument genericSummary(text,k,split=c(".","!","?"),min=5,breakdown=FALSE,...)

This works very well for single text entry, however it does not in my case. In the package explanation it says that the text input should be "A character vector of length(text) = 1 specifiying the text to be summarized".

Please see this example

# Generate a dataset example (text examples were copied from wikipedia): 
 
dd = structure(list(text = structure(1:2, .Label = c("Forest gardening, a forest-based food production system, is the world's oldest form of gardening.[1] Forest gardens originated in prehistoric times along jungle-clad river banks and in the wet foothills of monsoon regions. In the gradual process of families improving their immediate environment, useful tree and vine species were identified, protected and improved while undesirable species were eliminated. Eventually foreign species were also selected and incorporated into the gardens.[2]\n\nAfter the emergence of the first civilizations, wealthy individuals began to create gardens for aesthetic purposes. Ancient Egyptian tomb paintings from the New Kingdom (around 1500 BC) provide some of the earliest physical evidence of ornamental horticulture and landscape design; they depict lotus ponds surrounded by symmetrical rows of acacias and palms. A notable example of ancient ornamental gardens were the Hanging Gardens of Babylon—one of the Seven Wonders of the Ancient World —while ancient Rome had dozens of gardens.\n\nWealthy ancient Egyptians used gardens for providing shade. Egyptians associated trees and gardens with gods, believing that their deities were pleased by gardens. Gardens in ancient Egypt were often surrounded by walls with trees planted in rows. Among the most popular species planted were date palms, sycamores, fir trees, nut trees, and willows. These gardens were a sign of higher socioeconomic status. In addition, wealthy ancient Egyptians grew vineyards, as wine was a sign of the higher social classes. Roses, poppies, daisies and irises could all also be found in the gardens of the Egyptians.\n\nAssyria was also renowned for its beautiful gardens. These tended to be wide and large, some of them used for hunting game—rather like a game reserve today—and others as leisure gardens. Cypresses and palms were some of the most frequently planted types of trees.\n\nGardens were also available in Kush. In Musawwarat es-Sufra, the Great Enclosure dated to the 3rd century BC included splendid gardens. [3]\n\nAncient Roman gardens were laid out with hedges and vines and contained a wide variety of flowers—acanthus, cornflowers, crocus, cyclamen, hyacinth, iris, ivy, lavender, lilies, myrtle, narcissus, poppy, rosemary and violets[4]—as well as statues and sculptures. Flower beds were popular in the courtyards of rich Romans.", 
"The Middle Ages represent a period of decline in gardens for aesthetic purposes. After the fall of Rome, gardening was done for the purpose of growing medicinal herbs and/or decorating church altars. Monasteries carried on a tradition of garden design and intense horticultural techniques during the medieval period in Europe. Generally, monastic garden types consisted of kitchen gardens, infirmary gardens, cemetery orchards, cloister garths and vineyards. Individual monasteries might also have had a \"green court\", a plot of grass and trees where horses could graze, as well as a cellarer's garden or private gardens for obedientiaries, monks who held specific posts within the monastery.\n\nIslamic gardens were built after the model of Persian gardens and they were usually enclosed by walls and divided in four by watercourses. Commonly, the centre of the garden would have a reflecting pool or pavilion. Specific to the Islamic gardens are the mosaics and glazed tiles used to decorate the rills and fountains that were built in these gardens.\n\nBy the late 13th century, rich Europeans began to grow gardens for leisure and for medicinal herbs and vegetables.[4] They surrounded the gardens by walls to protect them from animals and to provide seclusion. During the next two centuries, Europeans started planting lawns and raising flowerbeds and trellises of roses. Fruit trees were common in these gardens and also in some, there were turf seats. At the same time, the gardens in the monasteries were a place to grow flowers and medicinal herbs but they were also a space where the monks could enjoy nature and relax.\n\nThe gardens in the 16th and 17th century were symmetric, proportioned and balanced with a more classical appearance. Most of these gardens were built around a central axis and they were divided into different parts by hedges. Commonly, gardens had flowerbeds laid out in squares and separated by gravel paths.\n\nGardens in Renaissance were adorned with sculptures, topiary and fountains. In the 17th century, knot gardens became popular along with the hedge mazes. By this time, Europeans started planting new flowers such as tulips, marigolds and sunflowers."
), class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))


# This code is trying to generate the summary into another column:

dd$sum = genericSummary(dd$text,k=1) 


This gives an error Error in strsplit(text, split = split, fixed = T) : non-character argument

I believe this is due to using a variable not a single text

My expected output is to have the generated summary for each row located in a corresponding second column called dd$sum

I tried using as.vector(dd$text) but this does not work. (I feel it still combines the output into one row).

I tried to read a bit about map function from purrr but was not able to apply it in this case and was wondering if someone with experience in r programming can help.

Also if you know a way to do this part using text summary packages eg lexrankr, this will also work. I tried their code from here but still not working. Text summarization in R language

Thank you

1

There are 1 answers

3
Ben Norris On BEST ANSWER

Check class(dd$text). It's a factor, which is not a character.

The following works:

library(dplyr)
library(purrr)
dd %>% 
  mutate(text = as.character(text)) %>%
  mutate(sum = map(text, genericSummary, k = 1))