Version Control for analytic projects

174 views Asked by Vadi At 11 June 2015 at 13:05

I've found several things about how to manage a data science projet with GIT but I didn't find something about how to manage a set of projects.

In 90% of the case I'm working alone and over the month a lot of people ask me to check:

the performance of our marketing operations
the impact on sales of special period like christmas.
clustering of our customers
simple predictive models (churn,...)

Here is my typical workflow for a single project:

Prepare the data in SQL
Make descriptive and predictive analysis in R/Python. I often use my own library of code which I update over the time
Create output results in Markdown or powerpoint presentation.

Here is the folder organisation for each project:

Data
- base
- processed
R scripts
Python scripts
Outputs (figures, markdown, powerpoint,...)

And two libraries of code in R and Python that I use for all the projects.

Question: In this case what is the best strategy ?

A single repository with all the projects because the libraries are shared among several projects ?

If yes, is it ok to have dozen of branches in the same repository like:

R_library_prod
R_library_dev
Python_library_prod
Python_library_dev
clustering_2015_prod
clustering_2015_dev
christmas_sales_analysis_prod
christmas_sales_analysis_dev
and so on
1. A repository for each project ? (with potentially only 2 branches: prod and dev)

If yes, how to manage the updates of the R and Python libraries ? Should I have a distinct repo for them and updates the libraries manually in the analytics projects repositories ?

Original Q&A

TechQA.

Version Control for analytic projects

There are 0 answers

Related Questions in GIT

Related Questions in VERSION-CONTROL

Related Questions in ANALYTICS

Popular Questions

Popular Tags

Trending Questions