Python Gzip

broken image


G = gzip.open('gzipfilename.gz', 'w', 5) # ('filename', 'read/write mode', compression level) g.write(s) g.close s = 'a long string of characters' g = gzip.open ('gzipfilename.gz', 'w', 5) # ('filename', 'read/write mode', compression level) g.write (s) g.close. An important compression algorithm is GZIP. Many web pages are transferred with GZIP compression. This reduces the time required to load pages. In Python, we can use the gzip module. And: We open the source file and then open an output file. We then apply the gzip open method to write the compressed file. Tip: The with statement is helpful here. This statement ensures that system resources are properly freed. Import gzip with gzip. Open ('example.txt.gz', 'rb') as inputfile: print ('Entire file:') alldata = inputfile. Read print (alldata) expected = alldata 5: 15 # rewind to beginning inputfile. Seek (0) # move ahead 5 bytes inputfile. Seek (5) print ('Starting at position 5 for 10 bytes:') partial = inputfile. Read (10) print (partial) print print (expected partial).

Python Gzip Compress File

I needed to gzip some data in memory that would eventually end up saved to disk as a .gz file. I thought, That's easy, just use Python's built in gzip module.

However, I needed to pass the data to pycurl as a file-like object. I didn't want to write the data to disk and then read it again just to pass to pycurl. I thought, That's easy also-- just use Python's cStringIO module.

The solution did end up being simple, but figuring out the solution was a lot harder than I thought. Below is my roundabout process of finding the simple solution.

Here is my setup/test code. I am running Python 2.7.3 on Ubuntu 12.04.

Try 1: seek from the end fails¶

Here is my first attempt using cStringIO with the gzip module.

I got this exception:

It turns out the gzip object doesn't support seeking from the end. See this thread on the Python mailing list: http://mail.python.org/pipermail/python-list/2009-January/519398.html

Try 2: data is not compressed¶

What if we don't seek() from the end and just tell() where we are? (It should be at the end after doing a write(), right?) Unfortunately, this gave me the uncompressed size.

Reading from the GzipFile object also gave me an error saying that I couldn't read from a writable object.

Python Gzip Open

Try 5: file much too small¶

I googled, then looked at the source code for gzip.py. I found that the compressed data was in the StringIO object. So I performed my file operations on it instead of the GzipFile object. Now I was able to write the data out to a file. However, the size of the file was much too small.

Python Gzip

Try 6: unexpected end of file¶

I saw there was a flush() method in the source code. I added a call to flush(). This time, I got a reasonable file size, however, when trying to gunzip it from the command line, I got the following error:

Try 7: got it working¶

I knew that GzipFile worked properly when writing files directly as opposed to reading from the StringIO object. It turns out the difference was that there was code in the close() method of GzipFile which wrote some extra required data. Now stuff was working.

Try 8: (not really) final version¶

Python Gzip -d

Here's the (not really) final version using a subclass of GzipFile that adds a method to write the extra data at the end. If also overrides close() so that stuff isn't written twice in case you need to use close(). Also, the separate flush() call is not needed.

Try 9: didn't need to do that (final version)¶

It turns out I can close the GzipFile object and the StringIO object remains available. So that MemoryGzipFile class above is completely unnecessary. I am dumb. Spring apache tomcat. Here is the final iteration:

References¶

Here is some googling I did:

Comments

Python

Python Gzip Open

Try 5: file much too small¶

I googled, then looked at the source code for gzip.py. I found that the compressed data was in the StringIO object. So I performed my file operations on it instead of the GzipFile object. Now I was able to write the data out to a file. However, the size of the file was much too small.

Try 6: unexpected end of file¶

I saw there was a flush() method in the source code. I added a call to flush(). This time, I got a reasonable file size, however, when trying to gunzip it from the command line, I got the following error:

Try 7: got it working¶

I knew that GzipFile worked properly when writing files directly as opposed to reading from the StringIO object. It turns out the difference was that there was code in the close() method of GzipFile which wrote some extra required data. Now stuff was working.

Try 8: (not really) final version¶

Python Gzip -d

Here's the (not really) final version using a subclass of GzipFile that adds a method to write the extra data at the end. If also overrides close() so that stuff isn't written twice in case you need to use close(). Also, the separate flush() call is not needed.

Try 9: didn't need to do that (final version)¶

It turns out I can close the GzipFile object and the StringIO object remains available. So that MemoryGzipFile class above is completely unnecessary. I am dumb. Spring apache tomcat. Here is the final iteration:

References¶

Here is some googling I did:

Comments


#1Norman Harman commented on :

Good article. I appreciate and it is nice to see your exploration / thought process.





broken image