Inlier subset is incoherent with is_data_valid in RANSAC

20 views Asked by At

I have the following data :

X = [[ 0.],[ 0.],[ 0.],[ 0.],[ 5.25799992],[10.51700001],[15.74699956],[21.03599973],[26.41500018]]
y = [181.42686706, 144.47493065, 143.93277864, 143.93277864, 166.07783771, 127.06519488, 80.16842458, 58.30687141, 48.83896311]

def no_similar_times(X: np.array, y: np.array) -> bool:
     #returns True if no duplicate in X else False
     print(X)
     print(len(np.unique(X.round(1))) == len(X))
     print("")
     return len(np.unique(X.round(1))) == len(X)

def get_inliers() -> np.array:
    # predictor is 2d polynomial
    ransac = RANSACRegressor(
        estimator=make_pipeline(PolynomialFeatures(3), LinearRegression()),
        min_samples=0.4,
        is_data_valid=no_similar_times,
    )

    ransac.fit(X, y)

    inlier_mask = ransac.inlier_mask_
    
    print("Inliers")
    no_similar_times(X[inlier_mask], y[inlier_mask])

    return ransac, inlier_mask


if __name__ == "__main__":
    get_inliers()

When running this code, I obtain an inlier_mask that corresponds to invalid data (meaning that no_similar_times(X[inlier_mask], y[inlier_mask]) returns False. It should not be the case since a set of inliers should necessarily be valid in the RANSAC routine not to be skipped.

When printing I obtain :

[[ 0.        ]
 [10.51700001]
 [ 0.        ]
 [ 0.        ]]
False

[[21.03599973]
 [15.74699956]
 [ 0.        ]
 [ 5.25799992]]
True

[[ 0.        ]
 [21.03599973]
 [10.51700001]
 [26.41500018]]
True

[[26.41500018]
 [ 0.        ]
 [10.51700001]
 [ 5.25799992]]
True

[[ 0.        ]
 [ 0.        ]
 [ 0.        ]
 [10.51700001]]
False

Inliers
[[ 0.        ]
 [ 0.        ]
 [ 0.        ]
 [ 5.25799992]
 [10.51700001]
 [15.74699956]
 [21.03599973]
 [26.41500018]]
False

Meaning that no_similar_times is working as expected but that the output inlier mask is not one of the valid subset that was generated during the fitting process.

Can someone explain what happens?

0

There are 0 answers