I want to segment indoor area and find objects. Then, I want to use stereo vision to find Cartesian position of objects. The final goal is picking objects on a table (and controlling the trajectory) by a robot.
Example of object: chair, table, pen, syringe, stapler, cup, screw, toy doll, ruler, small box, milk, fruits, ....
My first priority is being real time (10 Hz).
I use ZED Stereo Camera to capture images in windows 10 64 bit, MATLAB 2016b 64 bit, on Intel core i7-3820 (3.6 GHz).
The camera output is color 720x2560 pixel which is combination of two (right and left image) 720x1280.
I prefer to use unsupervised algorithms for finding position of unknown object on table. However, it should be down in real time. If it is no possible in real time, I will degrade my expectation and will use supervised algorithms to find predefined object.
I believe both problems that you mention (Segmentation and Detection) are still considered as open problems, therefore, there isn't a final solution. However, In the last years many works has been done to solve object detection and semantic segmentation using deeplearning with great performance and speed.
For Object Detection in real time I recommend to you check the results of YOLO and SSD and take a look also of Faster R-CNN since your requirements of 10Hz can be archive for it.
In the case of Object Segmentation you can try with DCNN that claims 8 fps. There are others, such as, DeepLab or FCN but I am not clear what is the speed of those systems/architectures.