BJWFlate



Introduction

This page assumes you know what BJWFlate and DeflOpt are and that you probably ended up here through ZipMax. If not, find out about ZipMax first at http://www.clrmame.com (click on the Download link there).

I frequently receive e-mails about BJWFlate, so I have decided to create this page to address some of those things. In particular, I will address speed, crashes, and compression sizes.

This page is a work in progress. It will grow and become more complete over time. Lines starting with "tbd:" contain hints for myself of things to be done, pieces of text yet to be written.

OK... here is a short description of BJWFlate anyway: BJWFlate creates zip files compatible with popular programs such as zip/unzip, PKZIP, and WinZip. Files are compressed in the Deflate format. I prefer calling it Deflate format over Zip format, since older versions of PKZIP create zip files using other, inferior, compression methods (Shrink, Reduce, Implode). So Zip format could mean several compression methods, only one of which is Deflate. BJWFlate does not shrink, reduce, or implode files. Yes, I know there are compression methods that are better than Deflate (and thus also better than Shrink/Reduce/Implode), but I am not interested in those as far as BJWFlate is concerned. My goal is to come up with he best possible compression using the Deflate format.

Any other third-party Deflate programs that I am not using yet within ZipMax (i.e. other than 7za, kzip, kzipondrugs, pacomp, PKZIP, pkzip25, pkzip25d, pkzipc, stuff, StuffIt, WinRAR, wzzip, and anything based on the zlib or BigSpeed Zip libraries)? Let me know!

Suggested settings to be put in zipmax.ini (replace XXX and YYY with the numbers of your choice):

  packer-exe-XXX = bjwflate.exe
  packer-cmd-XXX = -a -r -y %1 %2
  packer-exe-YYY = bjwflate.exe
  packer-cmd-YYY = -n -a -r -y %1 %2
  finalstep-exe = deflopt.exe
  finalstep-cmd = %1

Note that, starting with version 1.54 of BJWFlate, the "-5" switch has been renamed to "-n"!


Downloads

BJWFlate V1.54 (14-Sep-2003) BJWFlate154.zip (22,762 bytes)
DeflOpt V1.17 (07-Sep-2003) DeflOpt117.zip (14,659 bytes)

Note that, starting with version 1.54 of BJWFlate, the "-5" switch has been renamed to "-n"!


Speed

Almost everybody mentions/complains how slow BJWFlate is. Let me say a few things about that.

BJWFlate is still a work in progress. I still have more ideas on how to make it compress even better and also on how to make it faster. Some of the speed optimisations would make changing the code a lot more complex, though, so those will have to wait until BJWFlate is more or less "finished" in the sense that I have implemented all the ideas on getting even better compression. Other speed improvements will be implemented when I get to them.

BJWFlate was designed for extremely good compression, while remaining compatible with programs like zip/unzip, PKZIP, and WinZip. If you want fast compression, then use PKZIP or WinZip. BJWFlate was designed to attempt to squeeze every possible byte out of the files it compresses. PKZIP and WinZip were not designed for that purpose. They were designed for speed. So that you can quickly compress something and send it, for instance. Squeezing something down with WinZip to, say, 1.05 MB and then sending that 1.05 MB file takes a lot less time than squeezing it down to 1 MB using BJWFlate and then sending that 1 MB file. Again, BJWFlate was not meant for situations like that. It is meant for those people who are happy when they can make their zip files a few percent smaller, even if it takes quite a while to achieve that. In 99.99% of the cases, BJWFlate compresses files quite a bit better than PKZIP and WinZip, even when using the "maximum compression" setting in those programs.

It is a trade off between better compression and faster compression. I chose for better compression. So that leaves BJWFlate and other "slow" Deflate programs, other ones that were designed not for speed but for a few percent better compression, other "extreme" zip programs. And that brings us to the inevitable comparison with 7-Zip. Lots of times, BJWFlate compresses better than 7-Zip. Lots of times, 7-Zip compresses better than BJWFlate. I think that, in general, when looking at the performance of each on thousands of files, BJWFlate is slightly better than 7-Zip. Even when keeping the best results (using ZipMax) of 100 or so different settings for 7-Zip still seems to result in slightly overall worse compression performance than BJWFlate. And those 100 or so different 7-Zip runs combined are a lot slower than BJWFlate. Still, several people will say that 7-Zip is quite a bit faster than BJWFlate. It just depends on the settings used. The better the compression in general, the slower both become. Try 7-Zip for instance using 32 passes and 258 fast bytes (-mfb=258 -mpass=32) and then compare its speed to BJWFlate's. And there are other programs too that are slower than BJWFlate when you use them with "extreme" settings. As for 7-Zip and using 32 passes and 258 fast bytes, yes, it requires compiling your own version of 7-Zip. Instructions on which changes to make to the source code of 7-Zip can be found further down on this page. If you wonder about the 258: Without getting into too much detail, more than 258 is useless since 258 is the maximum length possible for a reference to an earlier block of data in the Deflate format.

In addition to, or instead of, making 7-Zip slower by using more extreme settings, you can make BJWFlate faster by using less extreme settings. There is a setting in BJWFlate called "number of splits", and, by default, this number is 120. This number can be changed by using the "-s" parameter. So the default settings are the same as specifying "-s120" or "-s 120" (the space is optional). If you really want to sacrifice some compression performance for speed, then try a lower setting. Values between 3 and 8192 are allowed. Experiment a little to find a setting that does not drive you crazy. :-P

As for the "-n" setting (called "-5" in versions 1.53 and earlier) in BJWFlate... it makes BJWFlate quite a bit slower, especially for larger files. I myself do not even use "-n" if the file is larger than 512 k (524,288 bytes). The ZipMax-addicts who use BJWFlate both with and without "-n" may be happy to hear that I am planning on putting an option in BJWFlate to have BJWFlate itself use both methods and then choose the best results. Preliminary tests seem to indicate that this would make things about 10% faster than the combined times of running each method separately. This is because BJWFlate would have to do steps 1 and 2 only once then.


Errors and crashes

Some people have reported BJWFlate causing an error in ZipMax and a few even have had their computer reboot while running BJWFlate. I think these are two different issues. As for spontaneous reboots, especially if they happen on stable operating systems like Windows 2000/XP, I am pretty sure that this is a hardware problem, not a problem with BJWFlate. BJWFlate is very CPU-intensive and I know from experience that when the CPU gets too hot, the computer might reboot. Use some utility to monitor the CPU temperature, improve the cooling, and try running some of the utilities floating around that are designed to stress-test CPUs by running them at 100% load for long periods of time (a package called "cpuburn" is one that comes to mind). These things should help pinpoint the problem and see if it is really BJWFlate causing the spontaneous reboots or something else. Like I said, my money in this case is on the CPU overheating.

As for BJWFlate causing errors in ZipMax, yes, I have occasionally noticed those too. But very rarely. Lots of times, trying the same file again works fine. And on "problem files", it seems to happen less often while running BJWFlate from a command line than when running it from within ZipMax. Since this problem is only intermittent and when it happens, it does not always happen at the same place within the zip file, it is very hard to find out exactly why. I have successfully compressed literally tens of thousands of files with BJWFlate on several different computers and operating systems and only a handful of times did I get an error in ZipMax. I am not saying this to make the problem seem minor (because I hate it that there is something wrong, no matter how rarely it happens); I am only saying this to indicate how hard it is to find out what is wrong. I have found out that it is some problem with memory allocation/deallocation. If it happens, it seems to always happen when deallocating memory. But I still do not yet know exactly why. Based on this knowledge, I have made BJWFlate's usage of memory more efficient and have already compressed thousands of files with it and have not had the problem happen again even once. If you are having this problem and you are using BJWFlate 1.50 or earlier, then I suggest using a later version of BJWFlate. Please let me know if it ever happens with version 1.51 or higher.


Compression sizes

If you are familiar with BJWFlate and with ZipMax, then you probably are also familiar with my DeflOpt program. DeflOpt actually was born as a result of BJWFlate. At some point while improving BJWFlate, I realised that the technique I had just implemented could be used on its own as well. So that I could take a zip file created by any program, examine the structures inside them, optimise them, and recode and rewrite the data using those optimised structures. So I wrote DeflOpt and was surprised to notice that it could still squeeze bytes out of heavily optimised files resulting from 200-pass ZipMax runs including a lot of different "extreme" 7-Zip settings. And I then realised that ZipMax needed an option to be able to run DeflOpt on the intermediate results. Because, in some cases, DeflOpting each zip file before ZipMax multiplexes them will yield better results than using DeflOpt on the final zip file created by ZipMax. For example:

Let's say program A compresses a file down to 100 bytes and that DeflOpt would be able to make it 98 bytes. And let's say program B compresses a file down to 99 bytes, but DeflOpt would not be able to gain any bytes on that. Without DeflOpt running on intermediate results, ZipMax would use the file created by program B and so end up with 99 bytes. The resulting file when ZipMax is finished would be 99 bytes and DeflOpt would not be able to improve on that. With DeflOpt running on each file created by each program within ZipMax, the file created by program A followed by DeflOpt would be 98 bytes and the one by program B followed by DeflOpt would be 99 bytes and, in this case, ZipMax would use the former and end up with 98 bytes.

Like I said, DeflOpt uses a technique that I developed in BJWFlate. This also means that DeflOpt will not be able to decrease the size of any file created by BJWFlate. The only exceptions to this are old versions of BJWFlate (i.e. ones that did not use this technique yet), and version 1.51 of BJWFlate (in which I had started to make some improvements to the technique). None of these exception versions of BJWFlate have ever been made public, so, for all practical purposes, it can be said that DeflOpt will never be able to improve on the size of a file created by BJWFlate. As mentioned, I made improvements to the technique in version 1.51 of BJWFlate (and it was yet again improved in versions 1.52, 1.53, and 1.54), and this technique could, again, be applied to zip files created by other programs. So I am planning on improving DeflOpt soon. Note that the improved technique includes the old technique, so the old technique will never perform better than the new one. (This is information that is useful for people who are wondering whether they should keep using older versions of BJWFlate and/or DeflOpt in addition to the new ones.)

BJWFlate compresses text files significantly better than programs like PKZIP and WinZip, but, strangely enough, it seems to perform generally worse on text files than other "extreme" zip programs, and also even than the much less "extreme" StuffIt (StuffIt Standard 8.0.0.148 (Engine 7.0.0.38)). As for StuffIt, I am pretty sure that it is especially optimised for text files. But BJWFlate's generally worse performance on text files is something I cannot yet explain. I did not develop BJWFlate with a certain type of file in mind, nor did I optimise it for any specific types of files. I am planning on examining in detail the differences between compressed text files generated by those "extreme" zip programs and those compressed by BJWFlate. I want to know why BJWFlate performs this way on text files and then do something about it. In contrast, BJWFlate generally performs better than all others on .wav files.

tbd: Talk about size -> spreadsheet... see my zipmax.ini settings... have to see if BJWstuff is redundant, because stuff.exe is not very good, unlike the GUI version (StuffIt).

As for the programs named in the spreadsheet, do not ask me to send you anything that I am not the copyright holder of. Several of those are commercial products, I have paid for them, and I am not allowed to distribute them. Even the ones that are free I am not allowed to distribute because I do not have the permission of the copyright holders to do so. The following programs named in the spreadsheet were written by me and are my copyright:

All these programs were written entirely by me; the source code (no, do not ask me for it) does not contain anything written by anyone else.


Usage

OK... I am lazy... here is the output that is shown by BJWFlate and DeflOpt, respectively, when running them from a command line without any parameters:

BJWFlate

***              BJWFlate V1.54              ***
***    Built on Sun Sep 14 12:12:29 2003     ***
***  Copyright (C) 2003 by Ben Jos Walbeehm  ***
Parameters:
  [options] <zipname> [<files to compress> [<files to compress> ...]]
Notes:
- Option -a: Use Athlon optimised code. May also be best on Pentium III and up.
- Option -n: No pre-partitioning. This is slower, but sometimes better.
- Option -r: Recursively go through all subdirectories.
- Option -s<numsplits>: Number of splits. Default: 120. Allowed: 3-8192.
- Option -v: Verbose output.
- Option -y: Answer "yes" to every question (forces overwrite).
- If <zipname> has no extension, ".zip" is used.
- Wildcards * and ? for filenames are allowed.
- If <files to compress> is not specified, * is assumed.
- If <files to compress> is a directory, all files in that directory are added.
- This program is MEMORY HUNGRY; do not even attempt to zip without having
  more RAM than 64 MB plus 20 times the size of the largest file to compress!
- The zip files created by this program are fully compatible with programs such
  as PKZIP 2.04g and WinZip, but usually smaller.

Note that, starting with version 1.54 of BJWFlate, the "-5" switch has been renamed to "-n"!

DeflOpt

***              DeflOpt V1.17               ***
***    Built on Sun Sep  7 18:22:50 2003     ***
***  Copyright (C) 2003 by Ben Jos Walbeehm  ***
Parameters:
  [options] <zipfile> [<zipfile> [<zipfile> ...]]
Notes:
- Option -b: Replace zip also when zero bytes, but more than zero bits saved.
- Option -n: No writing of zip files.
- Option -r: Recursively go through all subdirectories.
- Option -v: Verbose output.
- Wildcards * and ? for filenames are allowed.
- If <zipfile> is a directory, all files matching "*.zip" in that directory are
  processed.
- If <zipfile> has no extension, ".zip" is used, EXCEPT when <zipfile> contains
  wildcards or is a directory.
- By default, DeflOpt replaces the zip file only when it can reduce the number
  of bytes of the zip file. It is possible that DeflOpt makes the deflated data
  one or more bits shorter but that this does not make the size in bytes less.
  Specify the -b option to have DeflOpt replace files also when it can only
  save one or more bits but no actual bytes.


Version history

BJWFlate

Development was started on 16-Mar-2003. Whenever I changed the version number, that meant I had a new, more or less stable version in the ongoing development of this program, but I have not kept track of all the changes I made in all the versions up to about 1.50. I do not even have the source code or the executables anymore of versions before 1.16.

DeflOpt

Development was started on 04-Apr-2003. I have not kept track of all the changes I made in all the versions up to 1.16, but quite often, they reflected changes made to BJWFlate. I do not even have the source code or the executables anymore of most of the versions before 1.16.


Enabling very extreme settings for 7-Zip

This section assumes you are familiar with Microsoft Visual C++. I do not intend to write a compilation guide for 7-Zip; I will just show which changes to make to allow 7za.exe to use up to 258 fast bytes and up to 255 passes. As mentioned before, more than 258 fast bytes is useless since the maximum length for runs as defined by the Deflate standard is 258. The line numbers mentioned below refer to version 3.08.04 beta of 7-Zip. They may or may not need to be changed for other versions. And since I am not the author of 7-Zip, I can give no guarantees that the changes will work in other versions. Use at your own risk. The following modifications should be made to the file 7zip\Archive\Zip\ZipHandlerOut.cpp:

Line 263:

    if (m_Method.NumPasses < 1 || m_Method.NumPasses > 4)
      return E_INVALIDARG;
Change to:
    if (m_Method.NumPasses < 1 || m_Method.NumPasses > 255)
      return E_INVALIDARG;

Line 271:

    if (m_Method.NumFastBytes < 3 || m_Method.NumFastBytes > 255)
      return E_INVALIDARG;
Change to:
    if (m_Method.NumFastBytes < 3 || m_Method.NumFastBytes > 258)
      return E_INVALIDARG;
To make a new 7za.exe, load and compile the workspace 7zip\Bundles\Alone\Alone.dsw. Note that the project settings are such that it will try to put the executable in the c:\UTIL directory.


tbd: Talk about future plan for BJWFlate: Blockwise multiplexing! (like a blockwise ZipMax! (zip level --> file level --> block level)).

tbd: Talk about future plan for DeflOpt: The new technique introduced in versions 1.51-1.53 of BJWFlate.

tbd: Talk about ultimate goal: Always at least as good compression as any of the others --> who cares then if it is slow... as long as it is faster than ZipMaxing with 200 different settings.

tbd: Mention bug in unzip.

tbd: Mention other zip-related utilities by me, such as cmz (CreateMinimalZip) and ZipRen (rename files within zip files without recompressing).


Copyright © 2003 Ben Jos Walbeehm.
All rights reserved.