I have the following decision tree (created by JWEKA package - by the command J48(NSP~., data=training)
):
[[1]]
J48 pruned tree
------------------
MSTV <= 0.4
| MLTV <= 4.1: 3 -2
| MLTV > 4.1
| | ASTV <= 79
| | | b <= 1383:00:00 2 -18
| | | b > 1383
| | | | UC <= 05:00 1 -2
| | | | UC > 05:00 2 -2
| | ASTV > 79:00:00 3 -2
MSTV > 0.4
| DP <= 0
| | ALTV <= 09:00 1 (170.0/2.0)
| | ALTV > 9
| | | FM <= 7
| | | | LBE <= 142:00:00 1 (27.0/1.0)
| | | | LBE > 142
| | | | | AC <= 2
| | | | | | e <= 1058:00:00 1 -5
| | | | | | e > 1058
| | | | | | | DL <= 04:00 2 (9.0/1.0)
| | | | | | | DL > 04:00 1 -2
| | | | | AC > 02:00 1 -3
| | | FM > 07:00 2 -2
| DP > 0
| | DP <= 1
| | | UC <= 03:00 2 (4.0/1.0)
| | | UC > 3
| | | | MLTV <= 0.4: 3 -2
| | | | MLTV > 0.4: 1 -8
| | DP > 01:00 3 -8
Number of Leaves : 16
Size of the tree : 31
I would like to extract the nodes' values in 2 formats: one format only the name of the property such as: MSTV, MLTV, DP... etc., So each level of the tree will be followed by his parent, in the above case I would like to get the '(' as separator between each level such as:
(MSTV (MLTV...) (DP...) )
In the second format I would like to get the nodes with their values such as:
(MSTV 0.4 (MLTV 4.1 ....) (DP 0..... ) )
How can I extract the relevant information. I think to separate between the node values we should separate the characters by using gsub("[A-Z]:", "", string)
But we need to ignore the last lines.
Thanks a lot for your help.