File transferred as binary via libcurl is corrupted

666 views Asked by At

There are many other threads talking about binary file corruption, yet they all seem unrelated to my problem somehow.

I have a C program that downloads/reads files.

Because I do not write all the files I get into a file, the function that I use with curl rather stores data into a string. I may be free to write this string into a file, or not, later.

I have a binary file. I put it on a FTP.

If I download it via a ftp client like filezilla, it contains the right things (that is, the same characters I get when I cat my compiled binary) If I use a curl command line to download the file, it contains the right things as well.

If I use my program to download such a file, it will only contain a string like "ELF" followed by 3 unwritable/unreadable characters.

It's important to note that this only happens with binary files. Text files were transferred/read just file. It's also important to know that it seems to be the data that is passed to my function from curl that's already false: if I put a printf of the data in my write function, I see the same ELF + 3 unreadable chars string, therefore the method I use later on to write it into a file isn't in question.

When I use verbose, curl says it's in binary mode yet the binary isn't properly transferred.

Here is what I have so far, works fine with any file that's not binary, will always be garbage otherwise. Thanks in advance:

struct string 
{
  char *ptr;
  size_t len;
};

char *usr_psswd(char *user, char *psswd)
{
    char *usrpsswd;

    usrpsswd = (char *)malloc(strlen(user) + strlen(psswd) + 2);
    int i = 0;
    int j = 0;

    while (user[i])
    {
        usrpsswd[i] = user[i];
        ++i;
    }
    usrpsswd[i++] = ':';
    while (psswd[j])
    {
        usrpsswd[i] = psswd[j];
        ++i;
        ++j;
    }
    usrpsswd[i] = 0;
    return usrpsswd;
}

void init_string(struct string *s) 
{
  s->len = 0;
  s->ptr = malloc(s->len+1);
  if (s->ptr == NULL) 
  {
    fprintf(stderr, "malloc() failed\n");
    exit(EXIT_FAILURE);
  }
  s->ptr[0] = '\0';
}

size_t writefunc(void *ptr, size_t size, size_t nmemb, struct string *s)
{
    size_t new_len = s->len + size*nmemb;
    s->ptr = realloc(s->ptr, new_len+1);
    if (s->ptr == NULL) 
    {
        fprintf(stderr, "realloc() failed\n");
        exit(EXIT_FAILURE);
    }
    memcpy(s->ptr+s->len, ptr, size*nmemb);
    s->ptr[new_len] = '\0';
    s->len = new_len;
    return size*nmemb;
}

char *curl_get(char *addr, t_data *data)
{
  CURL *curl;
  CURLcode res;
  char *rtrn;
  curl = curl_easy_init();
  if(curl) 
  {
    struct string s;
    init_string(&s);
    curl_easy_setopt(curl, CURLOPT_URL, addr);
    curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, writefunc);
    curl_easy_setopt(curl, CURLOPT_WRITEDATA, &s);
    curl_easy_setopt(curl, CURLOPT_PORT, 21);
    curl_easy_setopt(curl, CURLOPT_USERPWD, usr_psswd(data->login, data->password));
    res = curl_easy_perform(curl);
    if(res != CURLE_OK)
    {
        printf("curl_easy_perform() failed: %s\n", curl_easy_strerror(res));
        free(s.ptr);
        curl_easy_cleanup(curl);
        return NULL;
    }
    rtrn = strdup(s.ptr);
    free(s.ptr);
    curl_easy_cleanup(curl);
  }
  return rtrn;
}
2

There are 2 answers

9
Some programmer dude On BEST ANSWER

Your problem is that you treat binary data as a string.

The strdup function works like any other string functions: It looks for the string terminator to find the end of the source string. And the string terminator '\0' is the byte value 0. So if the binary data contains any zero bytes (very very likely) then that will be seen as the end of the "string".

Easy solution? Just do return s.ptr; But do note that there's no way to find out the length of the data using the returned pointer. So a better solution might be to return s itself (as it contains a pointer to the data as well as the size of it).

2
ryyker On

Most of the issues you are seeing are due to using techniques designed for working with strings, but are being applied to binary files.

When writing code that must at some point be used with binary data and file content, it is best to follow a couple of rules

1) Variables used to contain binary data should prefer unsigned char over char. For example:

char *usr_psswd(char *user, char *psswd){...  

Should be written as

unsigned char *usr_psswd(unsigned char *user, size_t lenUser, unsigned char *psswd, size_t lenPsswd){...  

Note: Reason for including array lengths is covered below.

More on the rational of using unsigned char with binary data.

2) Avoid use of string functions such as strdup(), strlen(), etc. They are all written to look for the terminating null byte to indicated the end of a C string. For example: >

usrpsswd = (char *)malloc(strlen(user) + strlen(psswd) + 2);

should be written as:

 usrpsswd = malloc(lenUser + lenPasswd + 1);//No need for null terminator. (+1 for delimiter, per comments)
                                        //usrpasswrd should be unsigned char *
                                        //Casting return of malloc not recommended. in C.  

More on reliable ways to get array lengths in C.