Now of course with actual text this doesn't seem to make a lot of sense, but when you look at binary data and look at same-sized groups of it (usually one byte at a time), you will come across repeating patterns quite regularly. To stick with the above example, instead of "that that that that that that" you would write 111111 - and that you could shorten to something like 6x1. the more those words are used, the more space you save.Īlso, if something repeats multiple times, you could just state that instead of explicitly writing every occurrence. To give you an example with text, replace every "this" with 0, every "that" with 1, etc. You have a few methods of compressing data, for instance you could make a list of reoccurring patterns number them. On the contrary, if you have a text file that is English text with real meaning, then the compression ratio would be much lower than the sole space file because it conveys much more information. If you have a text file comprised solely of millions of space, the compression ratio would be very high because the text file almost convey no information. The compression of 650mb to 65mb is possible because the information of the 650mb convey is less than or equal to 65mb.ĮDIT: The largest compression ratio depends on the entropy. If you use a compressed image format, such as PNG, you would find that the white image would have small size, the circles-rectangles image would have medium size and the photo would have large size. The density of information it convey is called entropy. Since BMP is not compressed, the size of the three images above would be identical. In this way, you can expect that an image with some simple circles and rectangles would convey the amount of information between the white image and a photo. If an image is a photo, it would convey much more information than the white one because the color of each pixel varies a lot. If an image is completely white, it conveys less information. Assuming same bit per pixel and same image resolution. Imagine an BMP image file, which is an uncompressed image file format. The size of files has nothing to do with the actual amount of the information conveyed. How does it do this? Well that's software-dependent, there is no one answer to compression. It's quite normal for it to achieve long decompression times though, and I somewhat doubt that it is losless compression, as in I don't know that the resulting file will function. Typically it is hard to achieve better than a 2:1 compression ratio without losing any data though. That's a very basic compression algorithm but it gives you an idea of how they work, compression basically represents how data is spread in a file, then to decompress it you make a file based on that compressed information. The result would look something like, for example:Īnd hey, we divided the size of that section by three! You could replace it with some code that said "there's supposed to be 10 3's here". One thing you could do for example is find repeating bytes, for example if your bytes went like this at some point in the file: Now let's say you had a file that contained 100 bytes, and you wanted to compress it. All files are 0s and 1s, but 0 and 1 by themselves contain very little information, so we tend to group them by 8 "0s or 1"s, we call that unit a byte, a byte holds values from 0 to 255.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |