I have a dataset from a production line, which is formatted as time series data. There is a batch column, which indicates the name of the batch (str), and there is a phase column which indicates the phase of the production (str). I am working with the datetime as the index of the pandas DataFrame.
I want to plot this data on timeseries graph, overlaying the data from each phase and distinguishing each batch (i.e. different colour), with each process variable (i.e. temp1, temp2, press1, press2) on a different axis (as per the diagram)
How can this be done?
EDIT: for clarity, I need the trends to be plotted against a datetime baseline, otherwise they will not overlay.
Example of the dataset: | datetime | temp1 | temp 2|press1|press2|batch | phase | |:---- |:--: | :--: | :--: | :--: |:--: |:--: | | 2023-02-03 15:45:34| 34.45 | 23.34 | 13.23| 45.5 | 'D' | '10-Wait' | | ... | ... | ... | ... | ... | 'D' | ... | | 2023-02-03 15:55:34| 36.55 | 22.14 | 18.23| 78.5 | 'D' | '20-Initialise'|
To create a similar dataset -to mine- you can use the following code:
import numpy as np
import pandas as pd
import datetime
date = pd.date_range(start='1/1/2023', end='10/06/2023', freq=datetime.timedelta(seconds=30))
tags = ['temp1','temp2','press1','press2']
data=np.random.rand(len(date),len(tags))
df=pd.DataFrame(data,columns=tags).set_index(date)
batches = ['A','B','C','D','E','F','G']
n=len(batches)
period_start = pd.to_datetime('1/1/2023')
period_end = pd.to_datetime('10/06/2023')
batch_start = (pd.to_timedelta(np.random.rand(n) * ((period_end - period_start).days + 1), unit='D') + period_start)
batch_end = (batch_start + pd.to_timedelta(8,unit='H'))
df_batches = pd.DataFrame(data=[batch_start,batch_end],columns=[batches],index=['start','end']).T
for item in batches:
start_time = df_batches['start'][item]
end_time = df_batches['end'][item]
df.loc[((df.index>=start_time)&(df.index<=end_time)), 'batch'] = item
df.dropna(subset=['batch'],inplace=True)
df['phase']=''
phases = ['10-Wait','20-Initialise','30-Warm','40-Running']
for batch in batches:
wait_len = int(len(df[df['batch']==batch].index)*0.2)
init_len = int(len(df[df['batch']==batch].index)*0.4)
warm_len = int(len(df[df['batch']==batch].index)*0.6)
run_len = int(len(df[df['batch']==batch].index))
wait_start = df[df['batch']==batch].index[0]
wait_end = df[df['batch']==batch].index[wait_len]
init_end = df[df['batch']==batch].index[init_len]
warm_end = df[df['batch']==batch].index[warm_len]
run_end = df[df['batch']==batch].index[-1]
df['phase'].loc[wait_start:wait_end] = phases[0]
df['phase'].loc[wait_end:init_end] = phases[1]
df['phase'].loc[init_end:warm_end] = phases[2]
df['phase'].loc[warm_end:run_end] = phases[3]
df.to_csv('stackoverflowqn.csv')




Credit to @AvishWagde who definitely broke the back of the problem. The 1 missing ingredient was having the x-axis of each plot baselined against zero.
The solution to baselining these plots was to create a new
Timedeltacolumn which starts from00:00:00and goes upwards, in increments of00:00:30.In Avish's code he uses:
However, since the Dataframe index is a
Datetime, plotting this on the x-axis will not result in a comparison of the data. As stated they need to be plotted against a baseline. UsingTimedeltaon the x-axis allows comparison of the process data in each phase. In this case the00:00:00is taken to be the start of each phase. This dataset was recorded at 30s intervals, and it is necessary to convert theTimedeltafromIndextoSeries, as per this linepd.to_timedelta(np.arange(0,len(batch_data)*30,30),unit='s').to_series()which results in this slight change:For the full working code:
Wait: 2 x variables
Initialise: 2 x variables


Warm: 2 x variables