create decision tree from data

Question

create decision tree from data

1.2k views Asked by user3597496 At 23 June 2015 at 16:09

I'm trying to create decision tree from data. I'm using the tree for guess-the-animal-game kind of application. User answers questions with yes/no and program guesses the answer. This program is for homework.

I don't know how to create decision tree from data. I have no way of knowing what will be the root node. Data will be different every time. I can't do it by hand. My data is like this:

Animal1: property1, property3, property5
Animal2: property2, property3, property5, property6
Animal3: property1, property6
etc.

I searched stackoverflow and i found ID3 and C4.5 algorithms. But i don't know if i should use them.

Can someone direct me, what algorithm should i use, to build decision tree in this situation?

Original Q&A

There are 1 answers

**amit** · Accepted Answer · 2015-06-23T16:18:40+00:00

I searched stackoverflow and i found ID3 and C4.5 algorithms. But i don't know if i should use them.

Yes, you should. They are very commonly used decision trees, and have some nice open source implementations for them. (Weka's J48 is an example implementation of C4.5)

If you need to implement something from scratch, implementing a simple decision tree is fairly simple, and is done iteratively:

Let the set of labled samples be S, with set of properties P={p1,p2,...,pk}
Choose a property pi
Split S to two sets S1,S2 - S1 holds pi, and S2 do not. Create two children for the current node, and move S1 and S2 to them respectively
Repeat for S'=S1, S'=S2 for each of the subsets of samples, if they are not empty.

Some pointers:

At each iteration you basically split the current data to 2 subsets, the samples that hold pi, and the data that does not. You then create two new nodes, which are the current node's children, and repeat the process for each of them, each with the relevant subset of data.
A smart algorithm chooses the property pi (in step 2) in a way that minimizes the tree's height as much as it can (finding the best solution is NP-Hard, but there are greedy approaches to minimize entropy, for example).
After the tree is created, some pruning to it is done, in order to avoid overfitting.
A simple extension of this algorithm is using multiple decision trees that work seperately - this is called Random Forests, and is empirically getting pretty good results usually.

TechQA.

create decision tree from data

There are 1 answers

Related Questions in ALGORITHM

Related Questions in TREE

Related Questions in DECISION-TREE

Popular Questions

Popular Tags

Trending Questions