This is a question that has two parts:
First, I have a python UDF that creates a list of strings of unknown length. The input to the UDF is a map
(dict
in python) and the number of keys is essentially unknown (it is what I'm trying to obtain).
What I don't know is how to output that in a schema that lets me return it as a list (or some other iterable data structure). This is what I have so far:
@outputSchema("?????") #WHAT SHOULD THE SCHEMA BE!?!?
def test_func(input):
output = []
for k, v in input.items():
output.append(str(key))
return output
Now, the second part of the question. Once in Pig I want to apply a SHA hash to each element in the "list" for all my users. Some Pig pseudo code:
USERS = LOAD 'something' as (my_map:map[chararray])
UDF_OUT = FOREACH USERS GENERATE my_udfs.test_func(segment_map)
SHA_OUT = FOREACH UDF_OUT GENERATE SHA(UDF_OUT)
The last line is likely wrong as I want to apply the SHA to each element in the list, NOT to the whole list.
To answer your question, since you are returning a python list who's contents are a string, you will want your decorator to be
It can be confusing when specifying this structure because you only need to define what one element in the bag would look like.
That being said, there is a much simpler way to do what you require. There is a function
KEYSET()
(You can reference this question I answered) that will extract the keys from a Pig Map. So using the data set from that example and adding a few more keys to the first one since you said your map contents are variable in lengthQuery:
Output: