What's the best way to represent Hour of Day and Day of Week as a feature in for value prediction models in Machine Learning?

Question

What's the best way to represent Hour of Day and Day of Week as a feature in for value prediction models in Machine Learning?

1.6k views Asked by kai At 06 September 2017 at 18:08

When working with features in Machine learning and representing them in a matrix, what's the recommended way to represent hour of day and day of week as features for value prediction models?

Is using 0 for all hour values and 1 for the hour to represent the preferred way to represent these attributes as a feature? Same for day of week?

Thanks

Original Q&A

There are 2 answers

**Tushar Gupta** · Answer 1 · 2017-09-06T19:05:03+00:00

In this case there is a periodic weekly trend and a long term upwards trend. So you would want to encode two time variables:

day_of_week
absolute_time

In general

There are several common time frames that trends occur over:

absolute_time
day_of_year
day_of_week
month_of_year
hour_of_day
minute_of_hour

Look for trends in all of these.

Weird trends

Look for weird trends too. For example you may see rare but persistent time based trends:

is_easter
is_superbowl
is_national_emergency etc.

These often require that you cross reference your data against some external source that maps events to time.

Why graph?

There are two reasons that I think graphing is so important.

Weird trends: While the general trends can be automated pretty easily (just add them every time), weird trends will often require a human eye and knowledge of the world to find. This is one reason that graphing is so important.

Data errors: All too often data has serious errors in it. For example, you may find that the dates were encoded in two formats and only one of them has been correctly loaded into your program. There are a myriad of such problems and they are surprisingly common. This is the other reason I think graphing is important, not just for time series, but for any data.

Answer from https://datascience.stackexchange.com/questions/2368/machine-learning-features-engineering-from-date-time-data

**Nadjmeddine Boudjellal** · Answer 2 · 2019-04-17T14:20:40+00:00

no, your choice isn't perfect, because like that you will lose the loop representation because in hours the machine learning needs to know that 23:00 is near to 00:00 and the same thing in weekdays, it generally starts with Monday as 0 and Sunday as 6, so if you use your method, machine learning will represent every day or hours as a depending entity that has no relation with other, and that's wrong. the right way to represent this type of data is you represent each feature( hour, day of the week ..) with two features. those two features are the sin/cos of the value, for example for hours, you create hours_cos / hours_sin and then for each hour you calculate the sin and cos values, and before applying the sin and cos, you need to calculate theta, in python you just import pi from math then :

theta = 2 * pi * hour

then you import also sin and cos from math, and calculate the sin(theta) cos(theta)

TechQA.

What's the best way to represent Hour of Day and Day of Week as a feature in for value prediction models in Machine Learning?

There are 2 answers

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in FEATURE-SELECTION

Related Questions in FEATURE-ENGINEERING

Popular Questions

Popular Tags

Trending Questions