Downloading content with range request corrupts

628 views Asked by At

I have set up a basic project on Github: https://github.com/kounelios13/range-download. Essentially this project tries to download a file using HTTP Range requests, assemble it, and save it back to disk. I am trying to follow this article (apart from the goroutines for the time being). When I try to download the file using range requests the final size, after all requests data are combined, is bigger than the original size I would get have and the final file is corrupted.

Here is the code responsible for downloading the file

type Manager struct{
    limit int
}

func NewManager(limit int) *Manager{
    return &Manager{
        limit: limit,
    }
}

func (m *Manager) DownloadBody(url string ) ([]byte ,error){
    // First we need to determine the filesize
    body := make([]byte ,0)
    response , err := http.Head(url) // We perform a Head request to get header information

    if response.StatusCode != http.StatusOK{
        return nil ,fmt.Errorf("received code %d",response.StatusCode)
    }
    if err != nil{
        return nil , err
    }

    maxConnections := m.limit // Number of maximum concurrent co routines
    bodySize , _ := strconv.Atoi(response.Header.Get("Content-Length"))
    bufferSize :=(bodySize) / (maxConnections)
    diff := bodySize % maxConnections
    read := 0
    for i:=0;i<maxConnections;i++{
        min := bufferSize * i
        max := bufferSize * (i+1)
        if i==maxConnections-1{
            max+=diff // Check to see if we have any leftover data to retrieve for the last request
        }
        req , _ := http.NewRequest("GET" , url, nil)
        req.Header.Add("Range" ,fmt.Sprintf("bytes=%d-%d",min,max))
        res , e := http.DefaultClient.Do(req)
        if e != nil{
            return body , e
        }
        log.Printf("Index:%d . Range:bytes=%d-%d",i,min,max)
        data , e :=ioutil.ReadAll(res.Body)
        res.Body.Close()
        if e != nil{
            return body,e
        }
        log.Println("Data for  request: ",len(data))
        read = read + len(data)
        body = append(body, data...)
    }
    log.Println("File size:",bodySize , "Downloaded size:",len(body)," Actual read:",read)
    return body, nil
}

Also I noticed that the bigger the limit I set the more the difference between the original file content length and the actual size of all request bodies combined is.

Here is my main.go

func main() {
    imgUrl := "https://media.wired.com/photos/5a593a7ff11e325008172bc2/16:9/w_2400,h_1350,c_limit/pulsar-831502910.jpg"
    maxConnections := 4
    manager := lib.NewManager(maxConnections)
    data , e:= manager.DownloadBody(imgUrl)
    if  e!= nil{
        log.Fatalln(e)
    }
    ioutil.WriteFile("foo.jpg" , data,0777)
}

Note: for the time being I am not interested in making the code concurrent.

Any ideas what I could be missing?

Note: I have confirmed that server returns a 206 partial content using the curl command below:

curl -I https://media.wired.com/photos/5a593a7ff11e325008172bc2/16:9/w_2400,h_1350,c_limit/pulsar-831502910.jpg

1

There are 1 answers

7
Manos Kounelakis On

Thanks to @mh-cbon I managed to write a simple test that helped me find the solution . Here is the fixed code

for i:=0;i<maxConnections;i++{
        min := bufferSize * i
        if i != 0{
            min++
        }
        max := bufferSize * (i+1)
        if i==maxConnections-1{
            max+=diff // Check to see if we have any leftover data to retrieve for the last request
        }
        req , _ := http.NewRequest("GET" , url, nil)
        req.Header.Add("Range" ,fmt.Sprintf("bytes=%d-%d",min,max))
        res , e := http.DefaultClient.Do(req)
        if e != nil{
            return body , e
        }
        log.Printf("Index:%d . Range:bytes=%d-%d",i,min,max)
        data , e :=ioutil.ReadAll(res.Body)
        res.Body.Close()
        if e != nil{
            return body,e
        }
        log.Println("Data for  request: ",len(data))
        read = read + len(data)
        body = append(body, data...)
    }

The problem was that I didn't have a correct min value to begin with . So lets say I have the following ranges to download :

  • 0-100
  • 101 - 200

My code would download bytes from 0-100 and then again from 100-200 instead of 101-200

So I made sure on every iteration (except the first one) to increment the min by 1 so as not to overlap with the previous range

Here is a simple test I managed to fix from the docs provided as comments:

func TestManager_DownloadBody(t *testing.T) {
    ts := httptest.NewServer(http.HandlerFunc(func(writer http.ResponseWriter, request *http.Request) {

        http.ServeContent(writer,request ,"hey" ,time.Now() ,bytes.NewReader([]byte(`hello world!!!!`)))
    }))

    defer ts.Close()


    m := NewManager(4)
    data , err := m.DownloadBody(ts.URL)
    if err != nil{
        t.Errorf("%s",err)
    }

    if string(data) != "hello world!!!!"{
        t.Errorf("Expected hello world!!!! . received : [%s]",data)
    }
}

Sure there are more tests to be written but it is a good start