I have two data frames, one named sensor and one named train. The sensor data frame contains the data of a time series, whose index is the ts_sensor column. In the sensor data frame the names of the columns identify the name of the sensor, and each cell is filled with the value assumed by the sensor at the ts_sensor moment. The train data frame contains an id_sensor column filled with the name of the sensor (as in the sensor data frame) and the boot_threshold column in which each cell is filled with a threshold value for each sensor.
Here is the info related to the mentioned data frames.
train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 264 entries, 0 to 263
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id_train 264 non-null object
1 asset_id 264 non-null object
2 train_type 264 non-null object
3 equipment_type 264 non-null object
4 id_sensor 264 non-null object
5 unit 262 non-null object
6 is_operating 262 non-null float64
7 will_be_available 262 non-null float64
8 pi_acquisition_date 0 non-null float64
9 pi_description 264 non-null object
10 sensor_description 264 non-null object
11 tag_id 262 non-null float64
12 priority 262 non-null float64
13 for_ml 264 non-null int64
14 onoff_parameter 264 non-null int64
15 boot_threshold 264 non-null int64
16 efficiency_index 264 non-null int64
dtypes: float64(5), int64(4), object(8)
memory usage: 35.2+ KB
sensor.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1729930 entries, 0 to 1729929
Columns: 135 entries, level_0 to Australia.Blacktip.ICSS.KPI.EE.GasTurbineEfficiency_A
dtypes: float64(132), int64(2), object(1)
memory usage: 1.7+ GB
sensor.columns
Index(['level_0', 'index', 'sensor_ts',
'Australia.Blacktip.ICSS.360140413-100.PV',
'Australia.Blacktip.ICSS.360140415-100.PV',
'Australia.Blacktip.ICSS.360140417-100.PV',
'Australia.Blacktip.ICSS.360140421-100.PV',
'Australia.Blacktip.ICSS.360140423-100.PV',
'Australia.Blacktip.ICSS.360140425-100.PV',
'Australia.Blacktip.ICSS.360140433-100.PV',
...
'Australia.Blacktip.ICSS.3601PDI144.DACA.PV',
'Australia.Blacktip.ICSS.3601PIT111.PV',
'Australia.Blacktip.ICSS.3601PIT141.PV',
'Australia.Blacktip.ICSS.3601TIT042.PV',
'Australia.Blacktip.ICSS.3601TIT052.PV',
'Australia.Blacktip.ICSS.3601TIT111.PV',
'Australia.Blacktip.ICSS.3601TIT122.PV',
'Australia.Blacktip.ICSS.3601TIT141.PV',
'Australia.Blacktip.ICSS.3601TIT152.PV',
'Australia.Blacktip.ICSS.KPI.EE.GasTurbineEfficiency_A'],
dtype='object', length=135)
I want to obtain a new data frame in which the time series is respected and only the sensors whose ts_sensor value is currently higher than the corresponding boot_threshold value are present.
Thank you all
Are you looking for that ?
Output :