Possible to force logistic regression or other classifier through specific probability?

Question

Possible to force logistic regression or other classifier through specific probability?

711 views Asked by MLEN At 28 December 2016 at 21:57

I have a data set with a binary variable[Yes/No] and a continuous variable (X). I'm trying to make a model to classify [Yes/No] X.

From my data set, when X = 0.5, 48% of the observations are Yes. However, I know the true probability for Yes should be 50% when X = 0.5. When I create a model using logistic regression X = 0.5 != P[Yes=0.5].

How can I correct this? I guess all probabilities should be slightly underestimated if it does not pass true the correct point.

Is it correct just to add a bunch of observations in my sample to adjust the proportion?

Does not have to be just logistic regression, LDA, QDA etc is also of interest.

I have searched Stack Overflow, but only found topics regarding linear regression.

Original Q&A

There are 2 answers

davechilders On 28 December 2016 at 22:38

The OP wrote:

How can I correct this? I guess all probabilities should be slightly underestimated if it does not pass true the correct point.

This is not true. It is perfectly possible to underestimate some values (like the intercept) and overestimate others.

An example following your situation:

The true probabilities:

set.seed(444)

true_prob <- function(x) {

  # logit probabilities
  lp <- (x - 0.5)

  # true probabilities
  p <- 1 / (1 + exp(-lp))
  p

}

true_prob(x = 0.5)
[1] 0.5

But if you simulate data and fit a model, the intercept could be underestimated and other values overestimated:

n <- 100
# simulated predictor
x <- runif(n, 0, 1)
probs <- true_prob(x)

# simulated binary response
y <- as.numeric(runif(n) < probs)

Now fit a model and compare true probabilities vs fitted ones:

> true_prob(0.5)
[1] 0.5
> predict(m, newdata = data.frame(x = 0.5), type = "response")
       1 
0.479328 
> true_prob(2)
[1] 0.8175745
> predict(m, newdata = data.frame(x = 2), type = "response")
        1 
0.8665702

So in this example, model underestimates at x = 0.5 and overestimates at x = 2

**Ben Bolker** · Accepted Answer · 2016-12-28T22:13:38+00:00

I believe that in R (assuming you're using glm from base R) you just need

glm(y~I(x-0.5)-1,data=your_data,family=binomial)

the I(x-0.5) recenters the covariate at 0.5, the -1 suppresses the intercept (intercept = 0 at x=0.5 -> probability = 0.5 at x=0.5).

For example:

set.seed(101)
dd <- data.frame(x=runif(100,0.5,1),y=rbinom(100,size=1,prob=0.7))
m1 <- glm(y~I(x-0.5)-1,data=dd,family=binomial)
predict(m1,type="response",newdata=data.frame(x=0.5)) ## 0.5

TechQA.

Possible to force logistic regression or other classifier through specific probability?

There are 2 answers

Related Questions in R

Related Questions in LOGISTIC-REGRESSION

Related Questions in SPSS

Related Questions in LINEAR-DISCRIMINANT

Popular Questions

Popular Tags

Trending Questions