My complete code is as follows, python version is 3.8.12
import xgboost as xgb
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
# Obtaining the data set, method 1
from sklearn.datasets import load_boston
boston = load_boston()
X = boston.data
y = boston.target
# Obtaining the data set, method 2
# data_url = "http://lib.stat.cmu.edu/datasets/boston"
# raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
# X = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
# y = raw_df.values[1::2, 2]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
model = xgb.train({'objective': 'reg:squarederror'}, dtrain)
y_pred = model.predict(dtest)
mean_col1 = round(y_test.mean(), 4)
mean_col2 = round(y_pred.mean(), 4)
# first print
print(mean_col1, mean_col2)
# second print
print(f"real price avg: {mean_col1}, predict price avg: {mean_col2}")
the output is
21.4882 20.5224
real price avg: 21.4882, predict price avg: 20.52239990234375
My question is why the last number is not retained to four decimal places.
The first time I printed mean_col2, it was retained to four decimal places, but the second time I printed it, it was obviously the same variable, but I couldn't get the same result.
I have tried it in jupyter notebook, ipython, and py files, and the results are the same. And this happened only to mean_col2.
Please tell me what is the reason and how to solve it, thank you in advance.
It appears to just be a floating point precision error coming in the second print statement.
If you take your code, and print hex values instead of decimal representation you'll see that in both cases we're printing exactly the same bytes:
I'm not exactly sure why the floating point precision error only shows up in the second case, but to have the second statement print the result to 4 decimal places, you just add formatting to your f-string: