Version Control for analytic projects

163 views Asked by At

I've found several things about how to manage a data science projet with GIT but I didn't find something about how to manage a set of projects.

In 90% of the case I'm working alone and over the month a lot of people ask me to check:

  • the performance of our marketing operations
  • the impact on sales of special period like christmas.
  • clustering of our customers
  • simple predictive models (churn,...)

Here is my typical workflow for a single project:

  1. Prepare the data in SQL
  2. Make descriptive and predictive analysis in R/Python. I often use my own library of code which I update over the time
  3. Create output results in Markdown or powerpoint presentation.

Here is the folder organisation for each project:

  1. Data
    • base
    • processed
  2. R scripts

  3. Python scripts

  4. Outputs (figures, markdown, powerpoint,...)

And two libraries of code in R and Python that I use for all the projects.

Question: In this case what is the best strategy ?

  1. A single repository with all the projects because the libraries are shared among several projects ?

If yes, is it ok to have dozen of branches in the same repository like:

  • R_library_prod
  • R_library_dev
  • Python_library_prod
  • Python_library_dev
  • clustering_2015_prod
  • clustering_2015_dev
  • christmas_sales_analysis_prod
  • christmas_sales_analysis_dev
  • and so on

    1. A repository for each project ? (with potentially only 2 branches: prod and dev)

If yes, how to manage the updates of the R and Python libraries ? Should I have a distinct repo for them and updates the libraries manually in the analytics projects repositories ?

0

There are 0 answers