Zip files corrupt over 4 gigabytes – No warnings or errors – Did I lose my data?

MacOS

Question or issue on macOS:

I created a bunch of zip files on my computer (Mac OS X) using a command like this:

zip -r bigdirectory.zip bigdirectory

Then, I saved these zip files somewhere and deleted the original directories.

Now, when I try to extract the zip files, I get this kind of error:

$ unzip -l bigdirectory.zip
Archive:  bigdirectory.zip
warning [bigdirectory.zip]:  5162376229 extra bytes at beginning or within zipfile
  (attempting to process anyway)
error [bigdirectory.zip]:  start of central directory not found;
  zipfile corrupt.
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)

I have since discovered that this could be because zip can’t handle files over a certain size, maybe 4 gigs. At least I read that somewhere.

But why would the zip command let me create these files? The zip file in question is 9457464293 bytes and it let me make many more like this with absolutely no errors.

So clearly it can create these files.

I really hope my files aren’t lost. I’ve learned my lesson and in the future I will check my archives before deleting the original files, and I’ll probably also use another file format like tar/gzip.

For now though, what can I do? I really need my files.

Some people have suggested that my unzip tool did not support big enough files (which is weird, because I used the builtin OS X zip and unzip). At any rate, I installed a new unzip from homebrew, and lo and behold, I do get a different error now:

$ unzip -t bigdirectory.zip
testing: bigdirectory/1.JPG   OK
testing: bigdirectory/2.JPG   OK
testing: bigdiretoryy/3.JPG   OK
testing: bigdirectory/4.JPG   OK
:
:
file #289:  bad zipfile offset (local header sig):  4294967295
  (attempting to re-compensate)
file #289:  bad zipfile offset (local header sig):  4294967295
file #290:  bad zipfile offset (local header sig):  9457343448
file #291:  bad zipfile offset (local header sig):  9457343448
file #292:  bad zipfile offset (local header sig):  9457343448
file #293:  bad zipfile offset (local header sig):  9457343448
:
:

This is really worrisome because I need these files back. And there were definitely no errors upon creation of this zip file using the system zip tool. In fact, I made several of these at the same time and now they are all exhibiting the same problem.

If the file really is corrupt, how do I fix it?

Or, if it is not corrupt, how do I extract it?

How to solve this problem?

Solution no. 1:

Unzip below 6 seemingly fails, use

jar -xf 

if you have java installed, or yet another unzip before you write the file off.

See: https://serverfault.com/questions/235139/how-to-unzip-files-bigger-than-4gb

Solution no. 2:

I had a similar problem backing up a 12GB directory before performing a hard disk format. Funnily enough I used the same command as you.

I read around and found suggestions to run:

zip -F    

and

zip -FF     

to try to fix the file.

Unfortunately these did not work and I still received errors.

After looking around some more, I found the ditto command and it worked perfectly against my original (untouched) zip file:

ditto -x -k original-file.zip dst-directory   

-x to extract an archive
-k Specifies it to be a PKZip archive instead of the default CPIO

After using this command, I successfully extracted all of the files.

Solution no. 3:

Try 7z x

I had the same issue with unzip %x on Linux for a .zip file larger than 4GB, compounded with a only DEFLATED entries can have EXT descriptor error.

The command 7z x resolved all my issues though.

Be careful though, the command 7z x will extract all files with a path rooted in the current directory. The option -o allows to specify an output directory.

Solution no. 4:

The built-in macOS Archive Utility (which is the default used when you select something in Finder and go to File -> Compress “<item>”) also creates “corrupt” archives when a file in the archive is over 4 gigabytes in size, the size of the archive itself is over 4 gigabytes or you are trying to compress more than 65536 files into a single zip. This happens because it doesn’t use the Zip64 extension format.

This is mentioned on https://apple.stackexchange.com/questions/221020/large-zip-files-created-in-os-x-cannot-be-opened-in-windows and is well covered in the “Apple Archive Utility (and ditto) and very large ZIP archives” 2009 blog post for the now defunct Springy utility. You can also see the 7-Zip folks are aware of the Apple tools creating corrupt zips issue too.


But why would the zip command let me create these files?

Strictly speaking, the original zip format only supports archives up to 2^32 bytes (4GiB) and which do not contain files that were originally larger than 4GiB and you there must be less than 65535 files. Because the command line version of the Infozip command tools shipped with OSX up to version OSX 10.11 (El Capitan) was no newer than 5.52, it could only produce non-conformant archives if you forced it to exceed the original zip format limits. Infozip 6.0 and above know how to make Zip64 archives and that standard has much higher limits. The Infozip 6.0 command line tools started shipping with macOS 10.12 (Sierra). In 2014 when the question was originally asked the newest OSX was 10.10 (Yosemite).

As stated above, even in macOS 10.15 (Catalina) the GUI Archive Utility still creates such “corrupt” zips.


If the file really is corrupt, how do I fix it?

It’s corrupt in the sense that its non-conformant and will cause a lot of conformant tools to choke. You could extract (it see below) and then compress again with a tool that knows how to make Zip64 files…


Or, if it is not corrupt, how do I extract it?

Technically, all of the data from the files that have been compressed is still in the archive but the headers that allow fast listing of the zip’s content are broken. Such zips can be a struggle to work with when using other tools (even testing such a zip with the command line unzip tool on the same version of macOS can indicate issues like invalid compressed data to inflate / bad zipfile offset (local header sig)).

To get at the files of such zips you need to use a program that will quietly just extract whatever was compressed without checking for conformance or trying to check/list the files. Examples of tools that can do this are:

  • macOS Archive Utility GUI tool
  • macOS command line tool ditto
  • 7-zip
  • Java’s jar tool

Infozip based tools won’t be able to work with or repair such zip files once you’ve made such a problem zip file.

Solution no. 5:

I have faced exactly the same issue when I tried to unzip zip files of huge sizes (~7GB). I was damn sure that there was no error while copying the zip files to the server. (I double-checked it with rsync).

Depending on your situation, the solution is:

1) If you’re doing this in a local machine, right click on the zip file and give Extract Here, this will work for (.zip) files of any size.

2) If your zip files are in a remote server, first load the server filesystem locally using sftp (sftp://[email protected]). After that just navigate to the directory and again do the same thing as you did in (1). i.e. right click on the zip file and extract it.

Might not be the best solution but that’s one way of doing it.

Solution no. 6:

you can use

zip -FF corrupted.zip --out fixed.zip 

replace corrupted.zip by your zip with issues

replace fixed.zip by the name of new .zip file fixed

Hope this helps!