Recently, I have been working with the Requests library in Python. I wrote a simple function to pull down a file that took more than a minute to download. While waiting for the download to complete I realized it would be nice to have some insight into the download’s progress. A quick search on StackOverflow led to an excellent example. Below is a simple way to display a progress bar while downloading a file.

def download_file(url, name):
    '''
    Function takes a url and a filename, creates a request, opens a 
    file and streams the content in chunks to the file system.
    It then writes out an '=' symbol for every two percent of the total
    content length to the console.  
    '''
    filename = 'myfile_' + str(name) + '.ext'
    r = requests.get(url, stream=True)
    with open(filename, 'wb') as f:

        total_length = r.headers.get('Content-Length')

        if total_length is None:  # no content length header
            f.write(r.content)
        else:
            downloaded = 0
            total_length = int(total_length)
            for data in r.iter_content(chunk_size=4096):
                downloaded += len(data)
                f.write(data)
                done = int(50 * dl / total_length)
                sys.stdout.write("\r[%s%s]" % ('=' * done, ' ' * (50 - done)))
                sys.stdout.flush()

    return 1

What’s going on?

requests.get() takes a URL and creates an HTTP request. The stream=True flag is an optional argument that can be submitted to the Request class. It lets the Request know that the content should be downloaded in chunks instead of attempted to be pulled all at once.

The response headers are then searched for the ‘Content-Length’ attribute. We use the ‘Content-Length’ value to calculate how much is downloaded and what is left to download. The values are then stored in variables and updated as the chunks are processed.

The final piece to point out in this little function is the iter_content() method. iter_content():

Iterates over the response data. When stream=True is set on the request, this avoids reading the content at once into memory for large responses. The chunk size is the number of bytes it should read into memory.

This helps handle larger files and gives us a way to track progress. As chunks are processed, variables can be updated. If you do not need or want to roll your own, check out the tdqm library.

Join the Conversation

1 Comment

  1. This works by directly sending the “\r” symbol to console to move cursor back to the start. “print” in python does not recongise the above symbol for this purpose, hence we need ‘sys’

Leave a comment

Your email address will not be published. Required fields are marked *