Program will not output data to console when using a data input size greater than 30 million

105 views Asked by At

I'm trying to make a program that will eventually show the runtime differences with large data inputs by using a binary search tree and a vector. But before I get to that, I'm testing to see if the insertion and search functions are working properly. It seems to be fine but whenever I assign SIZE to be 30 million or more, after about 10-20 seconds, it will only display Press any key to continue... with no output. However if I assign SIZE to equal to 20 million or less, it will output the search results as I programmed it. So what do you think is causing this problem?

Some side notes:

I'm storing a unique, (no duplicates) randomly generated value into the tree as well as the vector. So at the end, the tree and the vector will both have the exact same values. When the program runs the search portion, if a value is found in the BST, then it should be found in the vector as well. So far this has worked with no problems when using 20 million values or less.

Also, I'm using randValue = rand() * rand(); to generate the random values because I know the maximum value of rand() is 32767. So multiplying it by itself will guarantee a range of numbers from 0 - 1,073,741,824. I know the insertion and searching methods I'm using are inefficient because I'm making sure there are no duplicates but it's not my concern right now. This is just for my own practice.

I'm only posting up my main.cpp for the sake of simplicity. If you think the problem lies in one of my other files, I'll post the rest up.

Here's my main.cpp:

#include <iostream>
#include <time.h>
#include <vector>
#include "BSTTemplate.h"
#include "functions.h"

using namespace std;

int main()
{
    const long long SIZE = 30000000;
    vector<long long> vector1(SIZE);
    long long randNum;
    binarySearchTree<long long> bst1;
    srand(time(NULL));

    //inserts data into BST and into the vector AND makes sure there are no duplicates
    for(long long i = 0; i < SIZE; i++)
    {
        randNum = randLLNum();
        bst1.insert(randNum);
        if(bst1.numDups == 1)//if the random number generated is duplicated, don't count it and redo that iteration
        {
            i--;
            bst1.numDups = 0;
            continue;
        }
        vector1[i] = randNum;
    }

    //search for a random value in both the BST and the vector
    for(int i = 0; i < 5; i++)
    {
        randNum = randLLNum();
        cout << endl << "The random number chosen is: " << randNum << endl << endl;

        //searching with BST
        cout << "Searching for " << randNum << " in BST..." << endl;
        if(bst1.search(randNum))
            cout << randNum << " = found" << endl;
        else
            cout << randNum << " = not found" << endl;

        //searching with linear search using vectors
        cout << endl << "Searching for " << randNum << " in vector..." << endl;
        if(containsInVector(vector1, SIZE, randNum))
            cout << randNum << " = found" << endl;
        else
            cout << randNum << " = not found" << endl;
    }

    cout << endl;

    return 0;
}
1

There are 1 answers

0
Tony Delroy On BEST ANSWER

(Comments reposted as answer at OP's request)

Options include: compile 64 bit (if you're not already - may make it better or worse depending on whether RAM or address space are the issue), buy more memory, adjust your operating system's swap memory settings (letting it use more disk), design a more memory-efficient tree (but at best you'll probably only get an order of magnitude improvement, maybe less, and it could impact other things like performance characteristics), redesign your tree so it manually saves data out to disk and reads it back (e.g. with an LRU).

Here's a how-to for compiling 64 bit on VC++: msdn.microsoft.com/en-us/library/9yb4317s.aspx