I have this dataset:
text sentiment
randomstring positive
randomstring negative
randomstring netrual
random mixed
Then if I run a countmap i have:
"mixed" -> 600
"positive" -> 2000
"negative" -> 3300
"netrual" -> 780
I want to random sample from this dataset in a way that I have records of all smallest class (mixed = 600) and the same amount of each of other classes (positive=600, negative=600, neutral = 600)
I know how to do this in pandas:
df_teste = [data.loc[data.sentiment==i]\
.sample(n=int(data['sentiment']
.value_counts().nsmallest(1)[0]),random_state=SEED) for i in data.sentiment.unique()]
df_teste = pd.concat(df_teste, axis=0, ignore_index=True)
But I am having a hard time to do this in Julia.
Note: I don´t want to hardcode which of the class is the lowest one, so I am looking for a solution that infer that from the countmap or freqtable, if possible.
Why do you want a
countmaporfreqtablesolution if you seem do want to use a data frame in the end?This is how you would do this with DataFrames.jl (but without StatsBase.jl and FreqTables.jl as they are not needed for this):
If your data is very large and you need the operation to be very fast there are more efficient ways to do it, but in most situations this should be fast enough and the solution does not require any extra packages.