Using C read large file then get the following error:OSError: exception: access violation writing 0x000001708DB6E000

69 views Asked by At

I Write C code to read LIBSVM data set files to call the function create_BatchKmeans to do clustering, but when the data sets are the large file like 777 MB (815,140,552 bytes) , there is an error:

OSError: exception: access violation writing 0x000001708DB6E000

I do not good at C ,So I need your help

SfDataSet* NewDataSet(const string& file_name)
{
    //std::cerr << "Reading data from: " << file_name << std::endl;
    SfDataSet* data_set = new SfDataSet(file_name, 40, false);
    return data_set;
}

int create_BatchKmeans(char* s, int s_len, int default_k, int default_iterations, float default_L1_lambda, float default_L1_epsilon)
{
    std::string ps(s, s_len);
    //memcpy(ps, s, s_len);
    for (int i = 0; i < s_len; i++)
    {
        ps[i] = s[i * 2];
    }
    ps[s_len] = '\0';
    SfDataSet* training_data = NewDataSet(ps);
    //training_data->vectors_[0].features_.size();

    int default_dimensionality = training_data->NumExamples();// (2 << 16);
    SfClusterCenters* cluster_centers = new SfClusterCenters(default_dimensionality);

    training_data->Transpose();

    sofia_cluster::OptimizedKmeansPlusPlus(default_k, *training_data, cluster_centers);

    sofia_cluster::BatchKmeans(default_iterations, *training_data, cluster_centers, default_L1_lambda, default_L1_epsilon);

    std::fstream result_stream;
    string str_result = ps;
    str_result += ".result";
    result_stream.open(str_result.c_str(), std::fstream::out);
    if (!result_stream)
    {
        return 1;
    }
    for (int i = 0; i < cluster_centers->cluster_centers_.size(); i++)
    {
        for (int j = 0; j < cluster_centers->cluster_centers_[i].dimensions_; j++)
        {
            result_stream << cluster_centers->cluster_centers_[i].weights_[j];
            result_stream << " ";
        }
        result_stream << "\n";
    }
    result_stream.close();
    std::cerr << "Done." << std::endl;
    return 0;
}
1

There are 1 answers

4
Barmar On

Assuming s_len is the length of s in characters, you're accessing outside the array when you read from s[i * 2].

Since NewDataSet() takes a string as its argument, ps should be a string rather than char*. You can initialize it from s.

std::string ps(s, s_len);
SfDataSet* training_data = NewDataSet(ps);

This copies the contents of s into ps, you don't need this loop:

for (int i = 0; i < s_len; i++)
{
    ps[i] = s[i * 2];
}