The MASM Forum Archive 2004 to 2012

Miscellaneous Forums => Miscellaneous Projects => Topic started by: Sergiu FUNIERU on February 23, 2010, 11:15:06 PM

Title: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 23, 2010, 11:15:06 PM
I am working on an algorithm that will compress at least 100 to 1 every file, no matter what OTHER algorithm was used to compress it before.

I contacted the creator of a very known commercial archive manager, and he told me that he's not interested in better algorithms. That got me thinking : is this a dead end? There is no interest in compression algorithms? I know that I received a "No, thanks" answer from only one person, but I think high of that person and his work.
Title: Re: 100 to 1 Compressor
Post by: dedndave on February 23, 2010, 11:55:37 PM
100 to 1 isn't very likely - not without data loss
of course, it depends on what the file has in it
but i seriously doubt any routine can maintain that on average

long ago, i wrote a program that would usually compress to about 65% or so
it took forever - about 3 minutes for 64 kb on a 200 MHz Pentium MMX
it took so long because it scanned forward, looking for the most effiicient method to use (about a dozen different methods)
it decompressed very fast, though   :P

i could probably write it faster, now - i may dig it out when i get caught up (like, the end of summer)
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 24, 2010, 12:00:06 AM
Quote from: dedndave on February 23, 2010, 11:55:37 PM
100 to 1 isn't very likely - not without data loss
of course, it depends on what the file has in it
but i seriously doubt any routine can maintain that on average
Not on average - EVERY time. Without any data loss. The algorithm doesn't contradict the Shannon's entropy.
Title: Re: 100 to 1 Compressor
Post by: dedndave on February 24, 2010, 12:02:16 AM
well, i am sure we'd all like to see it in action   :bg
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 24, 2010, 12:05:54 AM
Quote from: dedndave on February 24, 2010, 12:02:16 AMwell, i am sure we'd all like to see it in action   :bg
So do I.

Right now, my biggest problem is that I don't know how to create a patent for it.
Title: Re: 100 to 1 Compressor
Post by: dedndave on February 24, 2010, 12:10:18 AM
patents are usually devices that tell the competitor how you did it
they can quite often be overcome by minor improvements
but, i know a little about math and i can tell you that, unless the file is filled with repeat bytes, it isn't possible
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 24, 2010, 12:21:08 AM
Quote from: dedndave on February 24, 2010, 12:10:18 AM
patents are usually devices that tell the competitor how you did it
they can quite often be overcome by minor improvements
A patent is the best protection I know of. Do you know a better method?



Quote from: dedndave on February 24, 2010, 12:10:18 AMbut, i know a little about math and i can tell you that, unless the file is filled with repeat bytes, it isn't possible
I can't prove my point without revealing my method.
Title: Re: 100 to 1 Compressor
Post by: BlackVortex on February 24, 2010, 06:57:43 AM
You should patent that idea of yours as soon as possible, this is groundbreaking !
Title: Re: 100 to 1 Compressor
Post by: Ficko on February 24, 2010, 08:45:16 AM
Quote
,,Not on average - EVERY time. Without any data loss...."

That's a very bold statement. :lol

That would mean I give you 100 bytes and you compress it to 1 without losing the information? :dazzled:

Quote
... and he told me that he's not interested in better algorithms

Don't take wrong but I would turn you down as well for offering me such a program like I would turn down anybody else trying to sell me a perpetuum mobile. :toothy
Title: Re: 100 to 1 Compressor
Post by: ecube on February 24, 2010, 11:44:09 AM
heh, this guys funny, be gentle with him. :bg
Title: Re: 100 to 1 Compressor
Post by: dedndave on February 24, 2010, 12:17:24 PM
it's cool - we all have our theories that go sour
i was raised on a farm
so, i learned early that if you try to put 10 pounds of shit in a 5 pound sack, you wind up with a bit of a mess   :P
i think it is pretty cool that you can squeeze the air out of it and sometimes get 10 pounds into a 6.5 pound sack
nowdays they call it "information theory"
as simple farmers, we just called it a "shitty mess"
Title: Re: 100 to 1 Compressor
Post by: oex on February 24, 2010, 12:39:45 PM
You could win the remaining Hutter Prize with that http://prize.hutter1.net/
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 24, 2010, 12:42:36 PM
Quote from: Ficko on February 24, 2010, 08:45:16 AMThat would mean I give you 100 bytes and you compress it to 1 without losing the information?
No, I can't do that. But I can compress 100 MB to 1MB, or 100 Kb to 1Kb. I might go under 100 Kb for the lowest limit, but I'm not sure it worth the effort.
My algorithm would be best in large data compression, like storing a 2 Blu Ray disks on one regular cd.

Many people try to perfect the existing algorithms to squeeze 1 more bit. My method is a totally different approach.

Quote from: Ficko on February 24, 2010, 08:45:16 AMDon't take wrong but I would turn you down as well for offering me such a program like I would turn down anybody else trying to sell me a perpetuum mobile. :toothy
It's not a perpetuum mobile. It's like trying to sell someone the nuclear energy technology. If that person strongly believes that classic energy is the best way to go, I don't have any chance to persuade that person.

At first. I was upset that person didn't even want to hear that idea. Maybe it's better this way. I would've have given him my idea for free at that time - to a person who don't value its potential.

I know some people who still don't believe that loseless compression is possible. Their question is "How to I get back the information I remove when I compress?" They simply don't believe that some information is redundant and the compressor simply uses that fact.
Title: Re: 100 to 1 Compressor
Post by: oex on February 24, 2010, 12:50:34 PM
Compression algorithms find real limitations when put into practice, you should write your algoritm in asm and test it on various random sample data before you make too many wild claims.... What might seem obviously right now will soon find boundaries after you put pen to paper.

If you do indeed get it to work only then is it worth worrying about patents.
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 24, 2010, 12:50:59 PM
Quote from: oex on February 24, 2010, 12:39:45 PMYou could win the remaining Hutter Prize with that http://prize.hutter1.net/
Wow!

Thank you so much for telling me! I didn't know of such contest but it looks very interesting. I will read the conditions, to see if I have to give them the algorithm in exchange for the prize.
Title: Re: 100 to 1 Compressor
Post by: dedndave on February 24, 2010, 12:53:29 PM
how much is a "€" ? EDIT - nevermind - it is a euro   :P  i guess if you put 50,000 of them together, you have a few bux

Sergiu - you have to give them a working EXE
they do not claim ownership of the code
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 24, 2010, 12:53:37 PM
Quote from: oex on February 24, 2010, 12:50:34 PMCompression algorithms find real limitations when put into practice, you should write your algoritm in asm and test it on various random sample data before you make too many wild claims.... What might seem obviously right now will soon find boundaries after you put pen to paper.
It's all about numbers. It's like saying that I found an algorithm to solve the second degree equation (I know there is one - it's just an example). If it works from the mathematical point of view, implementing the algorithm is just a matter of asm skills.

Quote from: dedndave on February 24, 2010, 12:53:29 PMhow much is a "€" ?
It depends on the conversion rate from each country. In mine (Romania), 1 Euro is around $1.355, so the 50,000 Euros means around $67,780
Title: Re: 100 to 1 Compressor
Post by: dedndave on February 24, 2010, 01:05:34 PM
same here - a Romanian dollar is about equal to a US dollar, at the moment
that should be plenty to pay for a patent attorney   :bg
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 24, 2010, 01:12:34 PM
Quote from: dedndave on February 24, 2010, 01:05:34 PMsame here - a Romanian dollar is about equal to a US dollar, at the moment
We don't have a Romanian dollar (unfortunately). I was talking about US dollars.

By the way, it looks like the prize is not 50,000 Euros, but 50,000 Euros multiplied by a number less than one.
If we can verify your claim, you are eligible for a prize of 50'000€×(1-S/L).

Anyway, this looks like a good incentive. I'll have to improve my programming skills to get this job done before they change their minds.
Title: Re: 100 to 1 Compressor
Post by: BlackVortex on February 24, 2010, 01:16:21 PM
Quote from: dedndave on February 24, 2010, 01:05:34 PM
same here - a Romanian dollar is about equal to a US dollar, at the moment
that should be plenty to pay for a patent attorney   :bg
lmao, do you think every country has a dollar ? We're talking euros here. As in European Union currency ...
Title: Re: 100 to 1 Compressor
Post by: oex on February 24, 2010, 01:32:52 PM
Probably the biggest problem with compression is compression time.... The Hutter Prize algo takes 10 hours to compress 100Mb so to compress the data on my 250Gb HD would take my computer about 3.5 years by which time hard drives will be about 3 times larger lol. Also bandwidth is increasing fast so this also isnt a major bottleneck.... There are many cases where you could use better compression especially 100 to 1 compression but more likely on smaller devices than normal desktops.
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 24, 2010, 01:40:08 PM
Quote from: oex on February 24, 2010, 01:32:52 PMProbably the biggest problem with compression is compression time.... The Hutter Prize algo takes 10 hours to compress 100Mb so to compress the data on my 250Gb HD would take my computer about 3.5 years by which time hard drives will be about 3 times larger lol. Also bandwidth is increasing fast so this also isnt a major bottleneck.... There are many cases where you could use better compression especially 100 to 1 compression but more likely on smaller devices than normal desktops.
What people did before Euclid, when trying to calculate the gcd (Greatest Common Divisor) of x and y:
- they factorized both x and y
- they took the common divisors, at the lowest degree
Factorizing a 1000 digit number takes billions of billions of ... billions of years with our current technology.

Imagine when Euclid came with his algorithm and said:
"Hey, I can calculate the gcd of 2 numbers, with 1000 digits each in 10 seconds, on a P4 computer!"
People said "You're nuts! Our best algorithms will take centuries, so yours will take forever."
By the way, the gcd of 2 numbers, with 1000 digits each, on a P4 computer takes less than 10 seconds with the GMP library. I tried it myself.

Euclid's algorithm didn't factorize the x and the y. Simply, because he didn't need to. So, Euclid didn't optimize the classic algorithm - he replaced it with one that is more adapted to the task.

Of course, Euclid didn't have a P4, but you got the idea.
Title: Re: 100 to 1 Compressor
Post by: oex on February 24, 2010, 02:30:31 PM
Euclid did however live 2300 years ago, I expect in 2300 years time we will have both quantum computers capable of similar differences in abilities, maybe even ultra fast 100 to 1 compression :lol
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 24, 2010, 02:36:46 PM
Quote from: oex on February 24, 2010, 02:30:31 PMmaybe even ultra fast 100 to 1 compression :lol
I strongly believe that we already are capable of compressing 100 to 1, almost instantly, with the equipment we have nowadays. Maybe all we need is a fresh point of view regarding the algorithms that we currently use for compression.

Euclid didn't have better tools. He just tried to reduce the problem to the basics. By the way, did you know how he came up with the idea? Is fascinating to see what a clear mind can do.
Title: Re: 100 to 1 Compressor
Post by: oex on February 24, 2010, 02:46:25 PM
Well compression is considered a sure sign of intelligence, for my part I'm still struggling with 2 to 1 compression and it's not very instant but I'm working on it :)

I'm sure much of the limitations have more to do with hardware capabilities and the shear masses of data we consider small and easy to manipulate these days. In Euclids time there weren't many things not discovered however there are 2300 years of discoveries between then and now.
Title: Re: 100 to 1 Compressor
Post by: dedndave on February 24, 2010, 03:09:50 PM
if you look at the file they want you to compress, it is part of the html from wikipedia main page
mostly plain text characters - which compresses rather easily
the zip file it comes in is ~3:1 already   :bg
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 24, 2010, 03:15:24 PM
Quote from: dedndave on February 24, 2010, 03:09:50 PMwhich compresses rather easily
Yes, but when I look to the prizes, all they got so far was a couple of thousands. And their programs are available on the site, for download. I can't do that. I have to think of a method to prove that my algorithm works, before releasing it to the public.

Do you think that a demonstration posted on YouTube will do it?
Title: Re: 100 to 1 Compressor
Post by: dedndave on February 24, 2010, 03:17:55 PM
either way, it would be interesting to see
although, i have seen some smoke and mirror stuff on youtube, too
like some guy measuring 120 VAC from a battery - lol
but, you couldn't see the meter leads as they left and came back onto the visible screen   :bg

use their file as an example - that will give you some data to compare it with
Title: Re: 100 to 1 Compressor
Post by: oex on February 24, 2010, 04:09:51 PM
Quote from: dedndave on February 24, 2010, 03:09:50 PM
if you look at the file they want you to compress, it is part of the html from wikipedia main page
mostly plain text characters - which compresses rather easily
the zip file it comes in is ~3:1 already   :bg

:P I'm struggling to get past 2:1 on generic data not that wiki thing :lol.... Photo data only
Title: Re: 100 to 1 Compressor
Post by: oex on February 24, 2010, 04:12:45 PM
Quote from: Sergiu FUNIERU on February 24, 2010, 03:15:24 PM
Quote from: dedndave on February 24, 2010, 03:09:50 PMwhich compresses rather easily
Yes, but when I look to the prizes, all they got so far was a couple of thousands. And their programs are available on the site, for download. I can't do that. I have to think of a method to prove that my algorithm works, before releasing it to the public.

Do you think that a demonstration posted on YouTube will do it?

YouTube is public. If you post an exe to this site that compresses 100:1 pretty quickly you would be able to live off the publicity :lol, There would be intelligence agencies all over the world wanting to enlist you.
Title: Re: 100 to 1 Compressor
Post by: clive on February 24, 2010, 04:34:19 PM
Quote from: E^cube on February 24, 2010, 11:44:09 AM
heh, this guys funny, be gentle with him.

Indeed, I've got some data with pretty much zero entropy, any attempt to compress it will make it bigger, and possibly annoy it.

-Clive
Title: Re: 100 to 1 Compressor
Post by: oex on February 24, 2010, 05:09:59 PM
I wonder how much more efficient the web could be
Title: Re: 100 to 1 Compressor
Post by: dedndave on February 24, 2010, 05:37:37 PM
with that pic, his full name must be Clive A. Anderson   :P
Title: Re: 100 to 1 Compressor
Post by: oex on February 24, 2010, 05:42:18 PM
You hear that Mr. Anderson?... That is the sound of inevitability entropy...  :lol

I believe white noise is entropic? but why is it white?
Title: Re: 100 to 1 Compressor
Post by: Gunner on February 24, 2010, 11:03:19 PM
If you can do what you say you can do, I am sure I can get some backers and buy it from you for a couple of mil!  Would make that back in no time!
Title: Re: 100 to 1 Compressor
Post by: hutch-- on February 25, 2010, 07:40:35 AM
Sergiu,

This claim sounds like something out of fantasyland. You are talking to experienced programmers here. You can in fact come close if you have a gigabyte of identical characters and you run a variety of RLE but for random binary or similar data forget it.
Title: Re: 100 to 1 Compressor
Post by: Eddy on February 25, 2010, 08:48:47 AM
Data, encrypted with AES (or any other good encryption algorithm) does not compress at all ....

Kind regards
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 25, 2010, 10:32:26 AM
Quote from: hutch-- on February 25, 2010, 07:40:35 AMYou are talking to experienced programmers here.
I've never said otherwise. I have no intention to insult anyone with what I say.

Quote from: hutch-- on February 25, 2010, 07:40:35 AMfor random binary or similar data forget it.
In my opinion, we got used too much to the idea that the algorithms we use today can compress the data to the lowest limit.

I will take a pause from posting on this particular thread until my program will be ready, so I can offer a live demo. I've done before things that others told me that can't be done. When they saw the demo, they said "Oh, I've never thought of that".

My ex-boss is a very experienced professional, with more than 20 years of experience in that particular field. Working for so long, he had optimized to the limit the algorithms he learned in school. What I brought new was a fresh point of view. I'm sure that if the situation were reversed, he would have came with the fresh point of view. On a real life comparison, he was closed to the world record on running from point A to point B. I simply took the bus from point A to point B. Does that mean that I was smarter than him? No. It means that I didn't want to accept they were no other better ways to do a certain task, and I looked around for alternative solutions. He ceased to do that long ago, because  he was was confident that the algorithms he used can't be optimized any more. And he was right - THOSE algorithms couldn't be optimized any more.
Title: Re: 100 to 1 Compressor
Post by: sinsi on February 25, 2010, 10:40:51 AM
So you can compress a zip/rar/jpg/gif/7z at 100:1 eh?
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 25, 2010, 10:52:11 AM
Quote from: Eddy on February 25, 2010, 08:48:47 AMData, encrypted with AES (or any other good encryption algorithm) does not compress at all
May I ask you for a favor? Could you, please, send me PM with a sample of data that can't be compressed at all? I'm not ironical. I don't know how to make this sound like it should - I'm really interested in such a sample.
Title: Re: 100 to 1 Compressor
Post by: oex on February 25, 2010, 10:55:07 AM
You dont need encrypted data to prove the point, any full color photo you can compress over jpg compression losslessly

eg http://www.masterspring.com/Images/compression2_6110(kn2862).jpg
Title: Re: 100 to 1 Compressor
Post by: dedndave on February 25, 2010, 10:59:11 AM
Sergiu
practically any RAR or 7z file is pretty well compressed already
usually, attempting to compress an already compressed file again results in a larger file or, at least, no further compression
the same is true of the other formats, but these two are likely the most "tightly" compressed
Title: Re: 100 to 1 Compressor
Post by: Eddy on February 25, 2010, 10:59:26 AM
Quote from: Sergiu FUNIERU on February 25, 2010, 10:52:11 AM
Could you, please, send me PM with a sample of data that can't be compressed at all? I'm really interested in such a sample.
Sure. How long would you like the file to be? I am not sure what the max file length is that can be sent via PM. Otherwise, if you PM me your e-mail, I will send via e-mail.
(or I can also use a file sharing service. I'll see)

Kind regards
Title: Re: 100 to 1 Compressor
Post by: Eddy on February 25, 2010, 11:45:21 AM
Quote from: Sergiu FUNIERU on February 25, 2010, 10:52:11 AM
Could you, please, send me PM with a sample of data that can't be compressed at all?
Sergiu,
I have generated a file of about 100kB long. I used a Blum-Blum-Shub PRNG rather than AES, because I have software readily available to do that. This data should be uncompressable. Winzip and 7zip can't compress it for example.
The file (BBS.txt) can be downloaded here (max 100 downloads over the next 7 days):
https://www.yousendit.com/download/RmNDTG0rUzczMWxFQlE9PQ

Kind regards
Title: Re: 100 to 1 Compressor
Post by: Slugsnack on February 25, 2010, 08:50:03 PM
Just to confirm, when you say 'any file'. Do you mean you could continually compress a given file that had already been previously compressed with your algo ?
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on February 25, 2010, 09:16:39 PM
Quote from: Slugsnack on February 25, 2010, 08:50:03 PMJust to confirm, when you say 'any file'. Do you mean you could continually compress a given file that had already been previously compressed with your algo ?
No, that's why I had mentioned on my first post "no matter what OTHER algorithm was used to compress it before."

It's like trying to zip a zip file. I was able to 7z a particular rar file (with negligible compression), but I wasn't able to 7z a 7z file, or, for that matter, to recompress a compressed file, by using the same algorithm.
Title: Re: 100 to 1 Compressor
Post by: MichaelW on February 25, 2010, 09:27:43 PM
The attachment uses the cryptographic service provider to generate a 100000-byte sequence of random numbers, and zlib to test the compressibility (internally). Per the Microsoft documentation for the CryptGenRandom function:
Quote
The data produced by this function is cryptographically random. It is far more random than the data generated by the typical random number generator such as the one shipped with your C compiler.

Two sequences are generated, first the one generated by CryptGenRandom, and then a modified version of that sequence with every eighth byte set to zero. The zlib compressibility results are determined internally, and each of the sequences is stored in a file so it can be tested externally. In my tests of zlib, winzip, 7-zip, and winrar, for goodrand.bin they all produced an archive that was larger than goodrand.bin.
Title: Re: 100 to 1 Compressor
Post by: dedndave on February 26, 2010, 12:15:43 AM
QuoteIn my tests of zlib, winzip, 7-zip, and winrar, for goodrand.bin they all produced an archive that was larger than goodrand.bin.

probably as good a test for randomness as there is   :P
i would take that over the chi-squared value, any day
although - every 8th byte a zero - if the compression algo knew that was there, it might be able to take advantage of it
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on March 01, 2010, 04:23:56 PM
Quote from: Gunner on February 24, 2010, 11:03:19 PMIf you can do what you say you can do, I am sure I can get some backers and buy it from you for a couple of mil!  Would make that back in no time!
Okay, how do we do that?

I ran my first test on some files. My algorithm was able to compress them up to 120 times better than 7z (which was better than zip). Some files were compressed only 20 times better than 7z, but there is a lot of room for improvement. I simply lack the skills to write my program to that extent.

I contacted Microsoft and they told me, and I quote "We are not interested in new ideas at this time, but you can call other time if you wish".

For years, the idea that something heavier than air might fly made people suffocating with indignation. Please put the skepticism aside for a moment and tell me if you can help me present my idea to the right investors. Needless to say, you'll have your cut, when all this will be ending.

I'm also very interested in publishing a paper that explains the algorithm, at least the basic of it. I don't know how can I make the investor accepting this, if (s)he's going to patent my idea.

I need an investor with an open mind. Someone who will not reject my idea without listening first, just because all the books he read told him is not possible to do it. Is definitely possible and is very logical why.

I also need a programmer to implement the full algorithm. It might take me a while until I do it myself.

In my opinion, the idea of how to compress an already compressed file worth all the money. The implementation part is just a detail.
Title: Re: 100 to 1 Compressor
Post by: oex on March 01, 2010, 04:42:21 PM
Did you test on the random data in this post or indeed a photo to less than jpeg?

I can compress a row of 65535 0's to 3 bytes (21845 to 1) but that is rather irrelevent. It is very easy to create data sets to test on that are unreflective of real world scenarios but that compress well.... Another good example is counting numbers 1,2,3,4,5,6,7,8,9,10 etc, even numbers, odd numbers, numbers generated from formulae. The problem is that by the time you have put all this information in the header the data does not reflect real world scenarios or the header is too big.

As far as I'm aware people publish papers on proof of concept not just concept.

I dont wish to deride your idea, implementation is the only detail that matters, Bill Gates didnt say I have an idea called DOS, he wrote it. If he hadnt someone else would have made his billions. Youtube was bought by google after it had been running for 3 months. If you dont know how to write it the algorithm must not be complete in which case you have no proof of concept. ASM is basically math so if you have a completed algorithm it shouldnt take more than a few weeks or months to implement it whatever your current skill level in ASM.

Anyways, I wish you the best with your project Sergiu or in finding a backer.
Title: Re: 100 to 1 Compressor
Post by: dedndave on March 01, 2010, 08:08:50 PM
you compressed it, but did you expand it and compare the result to the original ?

As for working with information prior to obtaining a patent, there is something called a "non-disclosure agreement".
It protects the idea or invention during discussion and development.
You would want any potential investor to sign such an agreement before divulging the details.
You would also want one signed by a programmer that works on it.

You can also get a certain amount of protection by describing the idea in detail on paper.
Get someone who is technically knowledgeable to read it.
When they are done, get them to sign and date the paper with the words "Read and Understood by...(signature/date)"
Title: Re: 100 to 1 Compressor
Post by: MichaelW on March 02, 2010, 02:31:27 AM
Sergiu,

I think you need to spend some time researching the science behind data compression. Over the last 60-70 years a great deal of effort has been put into developing, and applying, this science. For your claim to be possible there would have to be some very large errors in this science, errors that no one else has uncovered.

And one detail that has bothered me from the start, why 100:1? Why not a more believable 20:1, or 1000:1, or ??  How did you arrive at this value?
Title: Re: 100 to 1 Compressor
Post by: hutch-- on March 02, 2010, 08:37:11 AM
Sergiu,

There is nothing wrong with good ideas but they must be good ideas. Except is extremely rare cases like massively long identical data where you can run algorithms like RLE, nothing comes close and it is alwayss data dependent. Simply generate an encryption standard random string that passes the range of tests in a tool like ENT and you are missing the necessary redundancy to compress the data. Without some form of repeatable byte sequences it simply cannot be done. Even at a bit level where some forms of compression algorithms utilise byte frequency priorities by writing the byte data in less than 8 bytes you can only handle a very limited number of cases before the mapping overhead to put everything back into place is larger than the data it is compressing.
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on March 02, 2010, 03:13:32 PM
Quote from: MichaelW on March 02, 2010, 02:31:27 AM
I think you need to spend some time researching the science behind data compression. Over the last 60-70 years a great deal of effort has been put into developing, and applying, this science.
I agree that studying what others have done is always the first thing to do. And it was the first thing I did.


Quote from: MichaelW on March 02, 2010, 02:31:27 AM
For your claim to be possible there would have to be some very large errors in this science, errors that no one else has uncovered.
I think that many people took for granted the algorithms they learned in school and tried to optimize them further, without thinking there might be other better algorithms. I'm sure some people thought of other unconventional methods to compress the date. Who knows, maybe while we speak, someone else is writing a new algorithms, totally different than the ones we use today? Until that person reveals his algorithm, all we have are the classic ones.

I studied compression algorithms myself, when I was in college. We had one year dedicated to this subject. But not even once I heard the teacher saying there might be better ways to compress. He presented the facts, like there were the best ways to do compression. I agree, there were the best ways known at that time.

It seems to me that you trust some opinions just because they were accepted by a large majority. However, history shows that it's possible even for large masses to be wrong. Before relativity theory, all believed the Newtonian physics, even the very educated people. Why? Maybe because they had no reasons to doubt it until someone else proved otherwise.

It might take me months, or even years to write my program the way I wanted. I agree with everyone here that a proof is more spectacular and I'm working on it.

The worst thing that happened to me in school was that I started to defend the classic theories. For a while, I ceased to think "What if?". Nobody proved that a zip compressed file can't be compressed anymore. I'm curios if, when my program will be ready, someone will be actually willing to try it or everybody will say "It doesn't worth to try, because is not possible what the program claims to do." And I'm not talking here about the idea "Want more space on your hard drive? Delete Windows!". I'm talking about a different way to compress.

"It's a known fact!", "Other people, smarter than you, didn't get better results", "If were possible, somebody else would have came with the idea until now" - I wish I got $1 each time I heard these.

I don't want to sound like Yoda, but I strongly believe that a person must let go the idea "it's not possible" before actually writing a better compression algorithm.

Quote from: MichaelW on March 02, 2010, 02:31:27 AM
And one detail that has bothered me from the start, why 100:1? Why not a more believable 20:1, or 1000:1, or ??  How did you arrive at this value?
100:1 was a compromise between size and speed. I put 100 as a nice number. If you bought a compression program that promises you a 100:1 compression, and delivers, for some files, even 115:1 compression, you'll not be upset, isn't it?
Title: Re: 100 to 1 Compressor
Post by: oex on March 02, 2010, 04:50:23 PM
Quote from: Sergiu Funieru on March 02, 2010, 03:13:32 PM
someone will be actually willing to try it or everybody will say "It doesn't worth to try, because is not possible what the program claims to do."

Sergiu,

You keep putting down claim after claim like the boy who called wolf with absolutely no evidence. People will fast become disinterested in your opinion if you claim to be able to do something or to know something but never deliver. Clever people with something to offer are generally far more reserved over their work until they have hard proof of concept or backing for their idea. At this stage your idea consists of a completely unsubstanciated brag concerning a problem that some of the best minds in the world would be unable to crack.

My advice would be not to brag but to actually *do the work*, then find a backer then you wont need to brag because you will be able to brush shoulders with people far nearer your intellect level. Sorry for my outburst, I have a short temper when people go on and on about something hypothetical with no substance :lol but we are rather straying from the concept of this forum in this article down to the level of a childish blog.

Proof of concept is all, people dont pay for concepts since the .com bubble and professionals on this forum get paid little enough for *real* work :lol

:'( And it's only Tuesday
Title: Re: 100 to 1 Compressor
Post by: Sergiu FUNIERU on March 02, 2010, 04:59:19 PM
Quote from: oex on March 02, 2010, 04:50:23 PMMy advice would be not to brag
If you go back to the original question, I simply wanted to know if people would be interested in such a compressor. I asked that question then because the answer I received from the creator of a certain commercial compressor.

I'm sorry if you believe that I brag. I only tried to explain why I'm so confident about my algorithm.

Maybe I need a childish point of view to actually believe that I can do what others refuse to believe is possible.

To stop offending more people with my claims, I'll answer again to this thread when my program is done.
Title: Re: 100 to 1 Compressor
Post by: MichaelW on March 02, 2010, 06:39:38 PM
Quote100:1 was a compromise between size and speed. I put 100 as a nice number. If you bought a compression program that promises you a 100:1 compression, and delivers, for some files, even 115:1 compression, you'll not be upset, isn't it?

So there was no science involved in your selection? You simply selected a ratio that you thought would interest or impress people? And you're surprised that that no one seems to be interested in your idea?
Title: Re: 100 to 1 Compressor
Post by: clive on March 02, 2010, 09:52:53 PM
Quote from: Sergiu Funieru on March 01, 2010, 04:23:56 PM
For years, the idea that something heavier than air might fly made people suffocating with indignation

Last I checked birds, arrows and cannon balls where all heavier than air, there were even dinosaurs which could fly and/or glide. So I think this is a poor example of a situation where evidence of things being possible is not available.

What you are proposing looks more like Alchemy, or frankly snake-oil, to those of us who have actually looked at the science and experimented with such things.

It is unsurprising that people selling existing technologies are uninterested, for they view your claims with significant skepticism. I'm sure they've heard from dozens of people claiming similar magic, who have been equally unable to deliver. If you could even demonstrate 2:1 compression and decompression on some of the example data provided on this thread, it would be outstanding.

No doubt there is some data which can be expressed mathematically as an equation or function, but to be able to derive them from a random stream of data would appear to be a herculean task.

Heck generating an MD5 sum of a 4K block would get you an apparent 256:1 compression, although it would take you a while to find all the possible collisions, and you'd want to cache all your examples.

-Clive
Title: Re: 100 to 1 Compressor
Post by: Neo on April 12, 2010, 04:27:22 AM
Quote from: Sergiu Funieru on February 24, 2010, 12:00:06 AM
Quote from: dedndave on February 23, 2010, 11:55:37 PM
100 to 1 isn't very likely - not without data loss
of course, it depends on what the file has in it
but i seriously doubt any routine can maintain that on average
Not on average - EVERY time. Without any data loss. The algorithm doesn't contradict the Shannon's entropy.
I can't believe I'm even responding to this, as this is like responding to someone who's claimed to have created an algorithm that can always correctly decide whether a program halts in finite time, or someone who claims to have a perpetual motion machine, but I sincerely hope you're just trolling.  To humour you, in case you're not trolling, please read the following simple explanation of why this is impossible.

Suppose you want to encode bit sequences of at most n bits (for simplicity, but can be chosen as arbitrarily large), and suppose that the indication of the length doesn't take up any extra space (also for simplicity, but can be done in logn extra space if you really want).  There are 2^(n+1) - 1 unique bit sequences of length n or less.  In order to decompress all of those sequences correctly, all of those sequences must have a unique compressed sequence, meaning that there must be exactly 2^(n+1) - 1 unique compressed sequences.  Encoding that many unique sequences requires at least n bits.  Therefore, no algorithm, no matter how complicated, even given infinite time and space, can compress uniform random data on average and get back the original sequence correctly every time.

If you do compress uniform random data on average, you will not always be able to correctly decompress the data, and compressing it by more than 1 bit on average means that on average you will get back wrong data.

That doesn't mean your algorithm isn't useful, it just isn't what you claim it is.
Title: Re: 100 to 1 Compressor
Post by: Arnold Archibald on May 24, 2010, 08:05:54 AM
I'm on Sergiu's side on this but above all he needs to get off his ass and write even a basic comp/decomp EXE.

All the current algos are too busy looking for patterns in some wild attempt to simulate neural networks.
Most usable data is restricted to patterns indeed, but most could be handled by optimizing known files and their structure.
To speed up image file rendering alot of the header data is wasted space that could be better stored, so we have a issue of balance between speed and size.
BMP palette size, for example, only needs one byte (or even 5bits) but it is stored on a word boundary taking up two bytes.
Even using something as simple as a subtraction filter would be better optimized, in the case of a 24 or 32bit photo BMP, by rearranging the colour channels first.
So a level of preparation can be applied to a file before compression actually takes place.
But the more repeatedly the data is passed over the longer it takes, whilst hopefully achieving better compression.

As for the tests on an algo using random data, they are useless, given the current contextual usage of random meaning data that for no other reason contains non usable noise that is without usage context.
We are missing a perspective on this in that depending upon how any given program decodes and uses its native files these files will when interpretted by another program appear either the same (or similar) or effectively as noise.
It is relativity on so many levels that changes the way we see the world/data.
Is it a knife or a sword? Isn't a knife a small sword or a sword a large knife. Aren't both simply blades with handles. (Mmmm, classes.) Why store both when you can store a template and the differences.
Have you had trouble recalling a whole bunch of similar memories but ease at recalling different memories?

What of a header growing larger?
If the algo is fit enough to perform it should be able to compress that aswell as all headers must have some structured quality.
Then write out a new smaller header containing instruction on decomp of the larger one.

Clear your head of rules, and start to play around with the relationships between things.

Is this a dead-end field? By no means.
I too am interested in reinventing the wheel, compression wise (and for many more fields for that matter), so naturally I was attracted to this thread.
For those that truly understand random data we are aware of it as THE universal set.
Inside this set are useless and useful data and all the grey levels inbetween.

A true random generator CAN create a file that contains RLE encodable data (the term "clustering" is used for this data type).
It can also create data that is tricky to shrink, so we understand that minus a header any file being compressed has about a 50/50 chance of actually compressing.
Whether it can compresses to a worthy level is down to the algo, so inevitably we must accept this chance of not compressing at least 50% of any of a set of files.

And so we come to just that. Clustering. Because I am also aware of another programmer in my city that is working on high powered compression.
He has investors and previous patents, and apparently has caught the interests of a telecommunications company.
So this field is by no means dead.

This might help.

1. Make your plan for the project.
    Keep accurate records, diaries, plans, diagrams and dated backups of all iterations of your work.
    Keep it simple for now so you can actually finish the prototype.

2. Test it out yourself on as many files as you can.
    The internet is full of test files for just this purpose. (Another sign this field isn't dead)

3. Write a kick arse EULA. 'nuff said.
    (There are some weird and wacky ones out there! I believe it was tuneXP that had the most succinct one)

4. This was mentioned earlier and it is very important: Non Disclosure Statements.
    These are essentially EULAs with a contractual twist.
    If you are paranoid about leaking of the prototype install countermeasures in the EXE so you can track who has distributed it or simply give the EXE an expiry date, get creative.
    The NDS should include a statement denying the testers from sharing copies and any EXE must come directly from you.

5. Get your friends and peers to test the prototype out.
    Get as detailed a feedback as possible. Perhaps build an activity logger inside the EXE.

6. If they are impressed get a whole bunch of them to become investors.
    Best that none of them have connections to organised crime.
    Write up a sweet contract that all parties agree to, especially you.

7. Patent that sucker.
    How else are you gonna make money from such a great piece of intellectual property.
    This is where the records come in handy, make sure to never let the originals out of your sight.

8. Publicity.
    The Hutter Prize is a great start, and because you kept the software simple it wont require much alteration to fit their rules.
    Plus you only need to provide the decomp EXE and the compressed file to win the cash.

9. Send out some very polite letters to whom you may consider interested parties.
   If you still have some money left over from the investors run a simple viral campaign extolling your wonderous invention.

10. Wait.
Title: Re: 100 to 1 Compressor
Post by: BlackVortex on May 27, 2010, 12:22:11 AM
tl;dr
but good compressors already use precompressing filters and file format recognition. FreeArc is a good example. Also, it's open source :
http://freearc.org/

P.S.: This thread has some golden stuff  :green2
Title: Re: 100 to 1 Compressor
Post by: oex on May 27, 2010, 01:31:06 AM
:lol I recommend before anyone else posts in favour of a 100 to 1 compressor they try compressing something and think long and hard about why it wont work.... Some of the posts in favour indeed have ideas for compression which are already in use in many compressors using some very sophisticated mathematics but some things really are impossible even for us ASMers :bg
Title: Re: 100 to 1 Compressor
Post by: joemc on May 27, 2010, 03:26:39 AM
Oh no not this thread again !   :bg
I was just starting to forget Sergiu's face too.
Title: Re: 100 to 1 Compressor
Post by: MichaelW on May 27, 2010, 04:52:06 AM
Quote from: Arnold Archibald on May 24, 2010, 08:05:54 AM
As for the tests on an algo using random data, they are useless, given the current contextual usage of random meaning data that for no other reason contains non usable noise that is without usage context.
We are missing a perspective on this in that depending upon how any given program decodes and uses its native files these files will when interpretted by another program appear either the same (or similar) or effectively as noise.

These tests were aimed at the "any file" claim.

QuoteA true random generator CAN create a file that contains RLE encodable data (the term "clustering" is used for this data type).

Yes, but only with a low probability, and the longer the run the lower the probability.



Title: Re: 100 to 1 Compressor
Post by: Rockoon on June 08, 2010, 07:32:12 PM
Sigh....

You folks are being trolled by someone that cant do what they claim. Period. The claim is impossible.

Also, I am very surprised that nobody on this of all forums, knows of the pigeonhole principle.

http://en.wikipedia.org/wiki/Pigeonhole_principle

Taking his 100KB or larger into 1KB or less claim, he is claiming that he can compress each of the 2^800000 possible input files into a unique file that only has 2^8000 states.

Impossible. Period.
Title: Re: 100 to 1 Compressor
Post by: oex on June 08, 2010, 08:15:21 PM
I have thought about this long and hard and have realised it is possible and so obvious....

Step 1: Take 100 floppy disks of your favorite OAP software
Step 2: Copy them to a CD

And hey presto.... 100 times space compression.... You just have to think outside the box :bdg

* Remember you heard it here first :bg
Title: Re: 100 to 1 Compressor
Post by: frktons on June 08, 2010, 10:08:20 PM
Quote from: Sergiu Funieru on March 02, 2010, 04:59:19 PM
To stop offending more people with my claims, I'll answer again to this thread when my program is done.

I hope soon or late Sergiu is going to post something concrete.
If he succeeds, I'll be happy for him and for the new ideas he can spread.
If he doesn't, I'll be happy for him as well, because he will learn something
new about the difficulties of realizing working stuff vs projecting it.

I'm open to the dream not only to science  :P
Title: Re: 100 to 1 Compressor
Post by: brethren on June 11, 2010, 06:16:35 PM
Quote from: joemc on May 27, 2010, 03:26:39 AM
I was just starting to forget Sergiu's face too.
(http://funieru.com/sergiu/Sergiu.jpg)

you can't forget!!!!
Title: Re: 100 to 1 Compressor
Post by: dedndave on June 11, 2010, 06:21:59 PM
he seems like a nice enough guy, even if a bit mis-guided
i am not particularly impressed with the compression on that image, though   :P
Title: Re: 100 to 1 Compressor
Post by: MichaelW on June 11, 2010, 10:55:08 PM
No offence to Sergiu, but images of him are like images of me, the smaller the better.
Title: Re: 100 to 1 Compressor
Post by: dedndave on June 12, 2010, 04:04:58 AM
yah - i'm nothing to look at, either
i hadda put wifies pic up so people would like me   :lol
Title: Re: 100 to 1 Compressor
Post by: BlackVortex on June 19, 2010, 04:02:18 PM
Haha, great thread, would click again !
Title: Re: 100 to 1 Compressor
Post by: Twister on August 05, 2010, 06:51:31 PM
This would be very complicated for media files. Those are known to get around ~1-10% compression from great commercial compression archivers.
Title: Re: 100 to 1 Compressor
Post by: Mirage on August 31, 2010, 03:59:59 PM
Eh couldn't a ratio like that be achieved by the compressor itself having a list or something of chunks of byte sequences that it would map to a smaller byte while still keeping unique? Mind you it would have to take pretty big chunks and the list itself would be pretty large but doing that would seem feasible to me.  :lol
Title: Re: 100 to 1 Compressor
Post by: frktons on August 31, 2010, 04:18:03 PM
Quote from: Mirage on August 31, 2010, 03:59:59 PM
Eh couldn't a ratio like that be achieved by the compressor itself having a list or something of chunks of byte sequences that it would map to a smaller byte while still keeping unique? Mind you it would have to take pretty big chunks and the list itself would be pretty large but doing that would seem feasible to me.  :lol

If you have a compressor program of 4 Gb you can have a 100K file compressed with a good compression ratio.
:lol
Title: Re: 100 to 1 Compressor
Post by: Rockoon on September 01, 2010, 03:55:52 PM
Quote from: Mirage on August 31, 2010, 03:59:59 PM
Eh couldn't a ratio like that be achieved by the compressor itself having a list or something of chunks of byte sequences that it would map to a smaller byte while still keeping unique? Mind you it would have to take pretty big chunks and the list itself would be pretty large but doing that would seem feasible to me.  :lol

There are more larger sequences than there are smaller sequences and this reality is very dramatic when talking about 100:1.

Think about it for a moment.

How many 100 byte sequences can you compress into 1 byte?  .. only 255 of them, right? (the 256th byte value would indicate that the 100 byte sequence is not in the dictionary)

Now, how many English sentences are 100 bytes or less? Almost all of them, right? Can you even come close to reducing all of English sentences to only 255 sentences?

http://en.wikipedia.org/wiki/Pigeonhole_principle
Title: Re: 100 to 1 Compressor
Post by: Mirage on September 01, 2010, 07:58:03 PM
Not exactly what I was shooting at but okay :P
Title: Re: 100 to 1 Compressor
Post by: Rockoon on September 01, 2010, 09:45:18 PM
Quote from: Mirage on September 01, 2010, 07:58:03 PM
Not exactly what I was shooting at but okay :P

It is what you were shooting at. You just dont realize it yet  :bg

The confusion is in the thought that if instead of (100 bytes : 1 byte), that if its (400 bytes : 4 bytes), that it would be different. Its not different at all.

The ratio of the number of permutations of 400 bytes to 4 bytes, is the same ratio as the number of permutations of 100 bytes to 1 byte.

There are more 400 byte sequences (a 241 digit number) than there are atoms in the visible universe (an 80 digit number.) A dictionary of 4294967295 pre-stored 400 byte sequences would take 1.7 terabytes of disk space, and still would not be able to attain 100:1 compression on average.

A simple experiment you can perform is to take a largish mp3 file and processes it 2 byte at a time, simply keeping track of whether or not each 16-bit possibility occurs. You will find that all 65536 possibilities occurs in the file. Pigeon Hole Principle in action. Each of 65536 items occur, and cannot therefore be encoded with less than 65536 codes. You will also find that the frequency of each 16-bit value is about the same, so you can't even get away with using shorter codes for more frequent ones.
Title: Re: 100 to 1 Compressor
Post by: xandaz on September 04, 2010, 07:43:52 PM
    Well.... i just saw this thread. It kinda got me laughing. Thanks Sergiu.