I have a very quick question with an easy reproducible example that is related to my work on prediction with bnlearn
library(bnlearn)
Learning.set4=cbind(c("Yes","Yes","Yes","No","No","No"),c(9,10,8,3,2,1))
Learning.set4=as.data.frame(Learning.set4)
Learning.set4[,c(2)]=as.numeric(as.character(Learning.set4[,c(2)]))
colnames(Learning.set4)=c("Cause","Cons")
b.network=empty.graph(colnames(Learning.set4))
struct.mat=matrix(0,2,2)
colnames(struct.mat)=colnames(Learning.set4)
rownames(struct.mat)=colnames(struct.mat)
struct.mat[1,2]=1
bnlearn::amat(b.network)=struct.mat
haha=bn.fit(b.network,Learning.set4)
#Some predictions with "lw" method
#Here is the approach I know with a SET particular modality.
#(So it's happening with certainty, here for example I know Cause is "Yes")
classic_prediction=cpdist(haha,nodes="Cons",evidence=list("Cause"="Yes"),method="lw")
print(mean(classic_prediction[,c(1)]))
#What if I wanted to predict the value of Cons, when Cause has a 60% chance of being Yes and 40% of being no?
#I decided to do this, according the help
#I could also make a function that generates "Yes" or "No" with proper probabilities.
prediction_idea=cpdist(haha,nodes="Cons",evidence=list("Cause"=c("Yes","Yes","Yes","No","No")),method="lw")
print(mean(prediction_idea[,c(1)]))
Here is what the help says:
"In the case of a discrete or ordinal node, two or more values can also be provided. In that case, the value for that node will be sampled with uniform probability from the set of specified values"
When I predict the value of a variable using categorical variables, I for now just used a certain modality of said variable as in the first prediction in the example. (Having the evidence set at "Yes" gets Cons to take a high value)
But if I wanted to predict Cons without knowing the exact modality of the variable Cause with certainty, could I use what I did in the second prediction (Just knowing the probabilities) ? Is this an elegant way or are there better implemented ones I don't know off?
I got in touch with the creator of the package, and I will paste his answer related to the question here:
The call to cpquery() is wrong:
A query with the 40%-60% soft evidence requires you to place these new probabilities in the network first
and then run the query without an evidence argument. (Because you do not have any hard evidence, really, just a different probability distribution for Cause.)
I will post the code that lets me do what I wanted off of the fitted network from the example.