Return updated datetime and script name as a dataframe - python

49 views Asked by At

The overarching aim of my query is to monitor the functioning of scripts through TaskManager. I have numerous scripts that are being run as recurring time points. But I have no idea if the script is being executed or hasn't run for whatever reason.

I'm setting out to create a central monitoring file for all my scripts that will return the datetime and if it ran. More importantly, if it didn't execute.

For this question, I've got a test or dummy script will run for a period of 5 minutes and then it will be terminated.

I'll setup TaskManager to run this script every minute. I want to create a central df that appends the datetime, along with the name of the script. That way, on the 6th minute when TaskManager runs, the script will not execute, which should be recorded as a null value.

import multiprocessing
import time
import pandas as pd
import datetime

# foo function
def foo(n):
    for i in range(10000 * n):

        test = pd.DataFrame(columns=['Script', 'DateTime', 'Processing'])

        test["DateTime"] = datetime.datetime.now()

        print(test)

        time.sleep(1)

        #return test

if __name__ == '__main__':

    p = multiprocessing.Process(target=foo, name="Foo", args=(10,))
    p.start()

    time.sleep(300)

    p.terminate()

    p.join()
1

There are 1 answers

0
Shavy On

Try this:

import multiprocessing
import time
import pandas as pd
import datetime

# Global DataFrame to store execution information
execution_log = pd.DataFrame(columns=['Script', 'DateTime', 'Processing'])

# foo function
def foo(script_name, n):
    for i in range(10000 * n):
        current_time = datetime.datetime.now()
        print(f"{script_name} is running at {current_time}")

        # Append information to the global DataFrame
        global execution_log
        execution_log = execution_log.append({'Script': script_name, 'DateTime': current_time, 'Processing': True},
                                             ignore_index=True)

        time.sleep(1)

if __name__ == '__main__':
    # Specify the script name
    script_name = "TestScript"

    # Create a multiprocessing process
    p = multiprocessing.Process(target=foo, name="Foo", args=(script_name, 5))
    p.start()

    # Wait for 5 minutes
    time.sleep(300)

    # Terminate the process
    p.terminate()
    p.join()

    # After termination, update the execution log
    execution_log.loc[execution_log['Script'] == script_name, 'Processing'] = False

    # Print the final execution log
    print(execution_log)

I've made the following changes:

  1. Moved the DataFrame creation outside the loop to prevent overwriting.
  2. Passed the script name as an argument to the foo function.
  3. Used a global DataFrame (execution_log) to keep track of script executions.
  4. Updated the script to append information to the global DataFrame.

This way, you can monitor the execution of your scripts over time, and the 'Processing' column will indicate whether a script is currently running or not.