I have two dataframes. Dataframe A has five columns: start_time, end_time, ID_user, ID_position and orientation. Dataframe B has four columns: timestamp, ID_user, ID_sender and RSSI.
I want to add the ID_position column of dataframe A to dataframe B, so I know which RSSI value (dataframe B) corresponds to which ID_position (dataframe A). To do this, I need to know where someone was on which time. So I need to check between which start_time and which end_time (dataframe A) the timestamp lies (dataframe B). In total, there are 277 positions.
In simple words: if timestamp (dataframe B) is between start time and end time (dataframe A), return ID_position to corresponding timestamp and add this as a column to dataframe B.
I have searched on many websites and tried many things, but this is probably the best I have come up with: I changed the columns to tolist() because lists are proceeded faster than columns. I tried to use for-loops in a function to go through start time and timestamp and to compare them. Instead of using start time and end time, I tried to use only start because this resulted in less for-loops (but using end time is better). I tried merge, assign and so on but could not figure it out. The most promising solutions I have are placed below. The first solution resulted in a list of ID_positions for one timestamp, so not one position.
def position (timestamp):
pos_list = []
pos = survey.ID_position
time = 1540648136288
for t in range(len(timestamp)):
if (timestamp[t] <= time):
pos_list.append(pos)
elif (timestamp[t] > time):
time = time + 8000
pos = survey.ID_position + 1
return(pos_list)
def numbers2 (position):
pos_ID = []
post_list = []
for i in range(len(position)):
pos_ID.append(position[i])
def num_pos2(timestamp):
pos_list = []
pos = ID
time = 1540648127883
for t in range(len(timestamp)):
if (time <= timestamp[t] <= (time+8000)):
pos_list.append(pos[i])
if timestamp[t] > time:
pos_list.append(pos[i+1])
time = time + 8000
position = pos[i+1]
return(pos_list)
Dataframe A (first few lines, 1108 rows × 5 columns, 277 positions in total)
start_time end_time ID_user ID_position orientation
0 1540648127883 1540648129883 1 1 1
1 1540648129884 1540648131883 1 1 2
2 1540648131884 1540648133883 1 1 3
3 1540648133884 1540648136288 1 1 4
4 1540648179559 1540648181559 1 2 1
5 1540648181560 1540648183559 1 2 2
6 1540648183560 1540648185559 1 2 3
7 1540648185560 1540648187846 1 2 4
8 1540648192618 1540648194618 1 3 1
9 1540648194619 1540648196618 1 3 2
10 1540648196619 1540648198618 1 3 3
11 1540648198619 1540648201336 1 3 4
Dataframe B (first few lines, 209393 rows × 4 columns)
timestamp ID_user ID_sender RSSI
0 1540648127974 1 1080 -95
1 1540648128037 1 1 -51
2 1540648128076 1 1080 -95
3 1540648128162 1 1 -53
4 1540648128177 1 1080 -95
Expected outcome dataframe B:
timestamp ID_user ID_sender RSSI ID_position
0 1540648127974 1 1080 -95 1
1 1540648128037 1 1 -51 1
2 1540648128076 1 1080 -95 1
3 1540648128162 1 1 -53 1
4 1540648128177 1 1080 -95 1
.......................... < a lot of rows between >
1809 1540648179571 1 1080 -75 2
1810 1540648179579 1 1 -55 2
1811 1540648179592 1 1070 -96 2
1812 1540648179627 1 1069 -100 2
1813 1540648179669 1 1080 -78 2
1814 1540648179772 1 1080 -79 2
The total dataset can be found on: http://wnlab.isti.cnr.it/localization
I want to check between which start time and end time (dataframe A) the timestamps from dataframe B are, and I want to return the ID_position of dataframe A. So that in the end, dataframe B has a column with the ID_positions corresponding to the right timestamps. For example: if the start time is 1 and the end time is 4, and the ID_position is 1. I want to get ID_position 1 for timestamp 3 because that is between 1 and 4.
Thank you in advance!
You can do an
outer merge
with both dataframes onID_user
which gives you amany-to-many
product back (so these are all the combination eg. cartesian product).Then we filter with
query
onstart_time < timestamp < end_time
:Output
note I didn't use inclusion with the
<
operator. You can change that to<=
if needed.note2 If your dataframes are big. This will be memory consuming, see explanation about
many-to-many
above.Edit after OP's comment about multiple positions
I still get the right results.