I have a data set of the following structure:
> data("household", package="HSAUR2")
> household[c(1,5,10,30,40),]
housing food goods service gender total
1 820 114 183 154 female 1271
5 721 83 176 104 female 1084
10 845 64 1935 414 female 3258
30 1641 440 6471 2063 male 10615
40 1524 964 1739 1410 male 5637
The column "total" is the sum of the first four columns. It's a household's expenditure grouped in four categories.
Now, if I wanted a conditional density plot of gender vs. total expenditure, I can go:
cdplot(gender ~ total, data=household)
And I will get this image:
I'd like the same picture with "total" expenditure on the x-axis, but the conditional distribution over the four classes (housing, food, goods, service) on the y-axis. I can only think of a very dirty hack where I generate a factor, and, for the first data line, I repeat "housing" 820 times, then "food" for 114 times, etc.
There has to be an easier way, right?
As I said, you're using the wrong tool to obtain what you want. You're envisioning a plot that cannot be obtained directly from your data (see bottom).
Instead, you need to model your data. Specifically, you want to predict the expected portion of expenditures in each category as a function of total expenditures. Then, the plot you're envisioning is shows the fitted values of that model (i.e., predicted proportion of expenditure in any area) as a function of total expenditures. Here's some code that does that using
loess
curves. I plot the raw data and the fitted values, to show you what is going on.The result:
If you tried to create a plot like this based on the raw data, it would look somewhat strange, but maybe that's what you're going for:
Result: