How to calculate the number of features based on image resolution in neural networks(non-linear hypothesis)?

1.3k views Asked by At

Came across Andrew Ng's non-linear hypothesis of neural networks where I had an MCQ to find the number of features for an image of resolution 100x100 of greyscale intensities.

And the answer was 50 million, 5 x 10^7.

However, earlier for a 50 x 50 pixel grey scale image, the number of features is 50x50 (2500) and for RGB image, it is 7500.

Why would it be 5 x 10^7 instead of 10,000?

He does however say including all quadratic terms (xi,xj) as features.

The question is:

Suppose you are learning to recognize cars from 100×100 pixel images (grayscale, not RGB). Let the features be pixel intensity values. If you train logistic regression including all the quadratic terms (xi,xj) as features, about how many features will you have?

And earlier he added that, if we were to use xi, xj ,we would end up with a total of 3 million features. Still I couldn't what relation is this?

2

There are 2 answers

0
Anthony Dave On

For 50x50 pixel, the answer is 3,128,750

At first it is a combination:

$$C^2_n for x_ix_j$$

And this:

$$n for x_i^2$$

$$n for x_i$$

Number of features = C^2_n + n + n.

And the answer for 50x50 pixel is 50015000.

0
TobiWestside On

You are confused by the similar names of the number of features of the image (= pixels) and the number of features a logistic regression algorithm would need to learn in order to solve the classification problem.

For the 100x100 pixel image, you have 10,000 pixels in the image. But, if you have a complex classification problem, it's not enough to learn a linear model for these pixels (e.g. theta0 + theta1*x1 + theta2*x2 + theta3*x1x2), you also need to include higher order terms, like x², which results in many more terms (= features) in your equation (e.g. theta0 + theta1*x1 + theta2*x2 + theta3*x1x2 + theta4*x1²x2 + theta5*x1x2² + theta6*x1²x2²).

This is what he meant with

If you train logistic regression including all the quadratic terms (xi,xj) as features

As you can see, we have all combinations of the quadratic terms of x1 and x2 in the equation above.

How many terms (= features) you need, depends on the complexity of the classification problem you want to solve.

This is the reason why you get such a high number of features with a much smaller amount of pixels. (He also shows an example of this around the 2 minute mark in the video)