I have an example test.csv file which is read into a pandas dataframe.

It has 20 rows and 7 columns.

The csv file captures information about SIP calls but the SIP messages for each call are incorrectly ordered. There are 2 SIP calls in this example and separated by an empty row.

The problem I am trying to solve is to reorder the sip messages correctly.

>>> dataframe = pd.read_csv('test.csv')
>>> print(dataframe)
    frame.number                           frame.time       ip.src       ip.dst                       sip.Call-ID sip.Method  sip.Status-Code
0        25355.0  May  9, 2019 15:57:01.433623000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85        ACK              NaN
1        25148.0  May  9, 2019 15:57:01.363890000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            200.0
2        18371.0  May  9, 2019 15:56:59.411452000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85     INVITE              NaN
3        18403.0  May  9, 2019 15:56:59.421261000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            100.0
4        25134.0  May  9, 2019 15:57:01.360769000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            183.0
5        20875.0  May  9, 2019 15:57:00.064251000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            180.0
6        19244.0  May  9, 2019 15:56:59.694785000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            100.0
7        19227.0  May  9, 2019 15:56:59.690747000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85     INVITE              NaN
8        19022.0  May  9, 2019 15:56:59.620685000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            407.0
9        19221.0  May  9, 2019 15:56:59.689779000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85        ACK              NaN
10           NaN                                  NaN          NaN          NaN                               NaN        NaN              NaN
11       25356.0  May  9, 2019 15:57:01.433623000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212        ACK              NaN
12       25149.0  May  9, 2019 15:57:01.363890000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            200.0
13       18372.0  May  9, 2019 15:56:59.411452000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212     INVITE              NaN
14       18404.0  May  9, 2019 15:56:59.421261000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            100.0
15       25135.0  May  9, 2019 15:57:01.360769000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            183.0
16       20876.0  May  9, 2019 15:57:00.064251000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            180.0
17       19245.0  May  9, 2019 15:56:59.694785000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            100.0
18       19228.0  May  9, 2019 15:56:59.690747000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212     INVITE              NaN
19       19023.0  May  9, 2019 15:56:59.620685000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            407.0
20       19222.0  May  9, 2019 15:56:59.689779000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212        ACK              NaN

After successfully re-ordering the rows of the dataframe, I'm going to insert a new column and classify the calls.

dataframe.insert(0, "Classified", " ")

I can't classify the calls correctly if the SIP messages aren't in order.

I've checked out pandas sort_index() and sort_values() but this only helps with one part of the logic to solve this problem.

>>> dataframe.sort_values(by=['sip.Call-ID'], inplace=True)

This will sort the csv file based on the sip.Call-ID column. These values are unique for every SIP call. So this allows the messages to be grouped together.

The values from frame.number should help in solving this problem. However, they can only be sorted based on each unique sip-call-id and not as a whole, otherwise we'll have overlapping sip calls. The pseudo code logic in my mind is as follows:

for each unique sip-call-id in dataframe:
    store it's related frame.number
    check if the next frame.number is smaller/bigger
    reorder rows based on condition above

The difficulty I'm having is accessing the indices of each row and knowing how to re-order them based on each unique sip call id and applying that re-order to the dataframe.

>>> frame_values = dataframe['frame.number'].values
>>> print(frame_values)
[25355. 25148. 18371. 18403. 25134. 20875. 19244. 19227. 19022. 19221.
    nan 25356. 25149. 18372. 18404. 25135. 20876. 19245. 19228. 19023.
 19222.]

The expected results are seen below. The frame numbers are in ascending order for each unique sip-call-id and the relevant SIP messages are now in order as-well. This is further clarified by the frame time, as they are also in ascending order. This means that the SIP messages are definitely in order.

By relevant SIP messages, I mean sip.Method & sip.Status-Code columns are now in order.


  44.44.44.44             55.55.55.55
     |                        |
     |       INVITE           | First SIP Message: INVITE Method
     |----------------------->|
     |    100 trying          | Second SIP Message: 100 Status Code
     |<-----------------------|
     |    407 Proxy Auth      | Third SIP Message: 407 Status Code
     |<-----------------------|
     |                        |
     |         ACK            | Fourth SIP Message: ACK Method 
     |----------------------->|
     |         INVITE         | Fifth SIP Message: INVITE Method
     |----------------------->|
     |                        |
     |    100 trying          | Sixth SIP Message: 100 Status Code
     |<-----------------------|
     |    180 ringing         | Seventh SIP Message: 180 Status Code
     |<-----------------------|
     |    183 session         | Eight SIP Message: 183 Status Code
     |<-----------------------|
     |       200 OK           | Ninth SIP Message: 200 Status Code
     |<-----------------------|
     |         ACK            | Tenth SIP Message: ACK Method 
     |----------------------->|
>>> print(dataframe)
    frame.number                           frame.time       ip.src       ip.dst                       sip.Call-ID sip.Method  sip.Status-Code
0        18371.0  May  9, 2019 15:56:59.411452000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85     INVITE              NaN
1        18403.0  May  9, 2019 15:56:59.421261000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            100.0
2        19022.0  May  9, 2019 15:56:59.620685000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            407.0
3        19221.0  May  9, 2019 15:56:59.689779000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85        ACK              NaN
4        19227.0  May  9, 2019 15:56:59.690747000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85     INVITE              NaN
5        19244.0  May  9, 2019 15:56:59.694785000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            100.0
6        20875.0  May  9, 2019 15:57:00.064251000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            180.0
7        25134.0  May  9, 2019 15:57:01.360769000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            183.0
8        25148.0  May  9, 2019 15:57:01.363890000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e25a5083a8f2c85        NaN            200.0
9        25355.0  May  9, 2019 15:57:01.433623000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e25a5083a8f2c85        ACK              NaN
10           NaN                                  NaN          NaN          NaN                               NaN        NaN              NaN
11       18372.0  May  9, 2019 15:56:59.411452000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212     INVITE              NaN
12       18404.0  May  9, 2019 15:56:59.421261000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            100.0
13       19023.0  May  9, 2019 15:56:59.620685000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            407.0
14       19222.0  May  9, 2019 15:56:59.689779000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212        ACK              NaN
15       19228.0  May  9, 2019 15:56:59.690747000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212     INVITE              NaN
16       19245.0  May  9, 2019 15:56:59.694785000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            100.0
17       20876.0  May  9, 2019 15:57:00.064251000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            180.0
18       25135.0  May  9, 2019 15:57:01.360769000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            183.0
19       25149.0  May  9, 2019 15:57:01.363890000 IST  55.55.55.55  44.44.44.44  0018506d493d00005e234fs23osd9212        NaN            200.0
20       25356.0  May  9, 2019 15:57:01.433623000 IST  44.44.44.44  55.55.55.55  0018506d493d00005e234fs23osd9212        ACK              NaN

0 Answers