Hello, world!!!

(All zip archives mentioned are actually .rars %)
(Latest news on the bottom ;)

08.12.2k - DC-kit#1 published. Some toolkit for construction of distance coders, (as far as i understand it :). Description in russian included.

22.12.2k - sh 0.02 & sh 0.02s (sources included). Kinda my replacement for base64 & all the other uuencoders :). Uses some rangercoder generalisation, so is able to work with configurable charsets. In contrast to bitcode solutions, allocates almost equal codespace intervals for all symbols, the size of encoded file is at least 20% less than the corresponding base64. Alas, it's fully compensated by usage convenience :(.

27.12.2k - lng-dc1 sources uploaded.

21.02.2l - huffcomp uploaded. Szymon Grabowsky thought that encode.com - standalone huffman encoder from ESP archiver - is the best and i'd tried to prove he'd been wrong :). Here's the sources & executables for my own static huffman encoder - 30% faster than ESPian and can be optimized yet...

22.02.2l - CL-D 1.08+ demo published. Order0 adaptive ari coder (rangecoder, more precisely) based on last of my ari pervertions - FPU-based dword-oriented carryless rangecoder - very simple, therefore fast and having MaxFreq=231. Here's an idea (inspired by Eugene Roshal :) to put it into public domain, but i hardly can image anyone understanding my style of asm programming. Nevertheless if somebody wants to help me in translating CL-D into C - you're welcome to write me a letter (my e-mail linked to page header :).
      ARK2 "Arithmetical Coding Heuristics" - an article (in Russian :) describing the process of ari coder construction (and underlying theory) as i see it.

24.02.2l - Got Shkarin's clearance and uploaded his C++ sources demonstrating arithmetical coding (rangecoder, actually, there's Subbotin's Carryless). Here's aridemo v0, aridemo v1 and my utility for time measuring.

25.02.2l - Added javascript to show the bottom of this page when loading.
             Although here's one more source package appeared - another version of Shkarin's order0, unnamed, but let we call it aridemo v2 ;).

27.02.2l - All Shkarin's order0 models and three coder versions (Subbotin's, Schindler's & mine) had been rewritten to work with the same interface. Now you can #include the model and coder you like, compile (VC/IntelC), and get working coder/decoder. Btw, i'd done all this to prove that CL-D (my coder) is faster than others ;). But to my great amazement it's appeared to be no faster at all :). Although its efficiency _is_ better anyway - on Calgary Corpus Schindler's coder is better than Subbotin's by 1179 bytes and my than Schindler's by another 148 bytes :). But my plans to present CL-D as the best in all senses had surely failed ;)... maybe i can optimize it to equalize its speed with its integer competitors, but certainly i shouldn't be able to make it twice faster :). ...Ah! Almost forgot to give a link to its location... Here it is - aridemo 1.00.

01.03.2l - Its spring :). So the numbers of aridemos continue to increase ;). Here's aridemo 1.01 containing some new comparative pictures and a lot of changes, main of which is a fix for that long-awaited CLD bug, found at last, great thanx to A.Ratushyak for testing. CLD is optimized slightly more; graphs are built now not for full program execution times, but only for encoding/decoding loops, some options added etc.
           And it's not all today 'couse i decided to publish that my program (RKGraph 1.06) which i use to draw all these graphs. It's raw enough, has almost no "fool-proof" at all, but works if you know how to make it :). Feel free to ask questions...

04.03.2l - The process continues. CLD's been fully rewritten to avoid bugs - so its not actually "CL" anymore :). A.Ratushnyak discovered some hidden glitches in my "schindler" rangecoder, but i haven't fought them yet :). V.Semenyuk decided to participate in our competion and donated his model - first one using the binary tree in this project :). But he thinks that aridemo in whole contains to much unneccessary things, so here's his coder alone - together with the text describing the model. And the aridemo 1.02, of course :).

08.03.2l - Some aridemo 1.03 appeared. CL-D proved that its now correct (thanx to A.Ratushnyak), "Schindler's" rangecoder has just been fixed at last. And there're now two completely different CL-D versions - plain C++, and mostly-asm. Imho, its interesting, that int64 C++ version's encoding is 20% faster than asm version - which uses FPU :).

13.03.2l - Since my first successful distance coding implementation i wanted to write some algorithm of the same kind, but stream-oriented. And here it is - Distance Mixing. C++ sources included. But alas, it appeared that this straight-forward approarch works something worse than even my DC from DC-Kit (235k vs 229k on book1.bwt).
Has anybody any ideas about how to improve it? :)

16.03.2l - Got an idea and had written some alphabet sorter. Mainly it's targeted on BWT applications - for alphabet reordering before blocksorting. Takes some char pair frequency table and sorts the columns by their entropy - and outputs the table with sorted columns and - in separate file - the permutation used.

22.03.2l - After some tests and discussions it became clear that the algorithm used in cnksort1 is quite useless :). Its application to data before BWT worsens compression... and now i think that it could be predicted - there no great sense in sorting just by entropy of char suffices distribution - almost no correlation between similatity of two suffix blocks and their entropy :(. So i'd rewritten the things - anyway it doesn't help with blocksorting alphabet reordering, but may be used in another way :). The idea (inspired by Vadim Yoockin) was to reorder the char blocks in file to put similar fragments together and allow better statistical compression. This one - works! :). (AridemoV6 compresses book1.bwt into 326,972 bytes without block reordering and into 318,160 with it (by 2k blocks), including order information!
I think this idea is useful enough... although maybe not with my current algorithm of optimal order detection :).

23.03.2l - Here's improved version of file block reordering - slow too, but now optimization needs more like O(N^3) time - which is comparatively tolerable :). Although now it generates the frequency distribution graphs for data before and after reordering - block number increases along horizontal axis and char's code - along vertical.

16.04.2l - Spent a lot of time trying to find weighting function. Seems that evaluating the actual probability of zero bit when _two_ (possibly different) estimations p1 and p2 are given by two models is quite simple. There is a common assumption that in such case a value given by formula pm = w1*p1+w2*p2 must be used, where w's are the models' weights. If both models considered equally precious, then it will be just average probability. But we can instead use p's as a contexts and collect the real bit frequencies corresponding to each {p1;p2} pair. And it turned out that mixing function is far enough from simple average. And it's even smooth enough, but i failed to pick up a function giving similar results :(. Maybe you'll succeed ;). Here is some toolkit for doing it - program to collect statistics, coder which shows that average is not the best etc.

04.05.2l - Just reimplemented that context mixing scheme of my own (PPM211) using C. It's slow, but very simple, especially comparing to PPMD :). Not that good for binaries but can compress texts slightly better than PPMD v.F did/does. Encodes book1 into ~210.9k. ...And decodes :).

14.05.2l - PPM211 suxx 'cause it uses some heuristical magic. So i'd just developed another coder, much more logical in nature than previous. Well, it suddenly appeared more effective too, but i never expected such a behavior :). So here's the PPM210 - my new model which currently reaches 209,890 on book1.

22.05.2l - Hope this was the last update; the model is full of magic again :). Though 207,879 on book1 is already not so bad... Yet made some workaround to avoid too slow code compiled by IntelC/VC in some cases - now it's 20% or more faster.

31.05.2l - Hopes are never true :) ... So here's yet something. Its kinda improved version, more stable on binaries than previous one. Also i've at last added some options allowing to restrict maximum order and memory usage. Compresses book1 into 207,379.

02.06.2l - Yet another version, slightly tuned, Works something faster and most files are compressed better than previously, though not all, alas :). 206,991 on book1.

06.09.2008 - Here's my new site: CtxModel.Net,


Links Downloads
Hosted by uCoz