News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

CHM files

Started by donkey, February 24, 2012, 07:34:40 AM

Previous topic - Next topic

donkey

CHM files have to be the most complex files I have ever had the misfortune to get interested in. They are compound files that expose the IITStorage interface and subsequently the IStorage interface. The IITStorage interface is basically undocumented except at a few reverse engineering sites around the net. However, even with the interface definitions it is extremely difficult to worm through a CHM file. The file consists of binary trees, tables and arrays of structures that point eventually to an HTML file stored in its file system. Just getting a list of topics and displaying the related page is a task that requires pretty extensive knowledge of the file format. There are a few good sources for the format, google will find them for you, this one is probably the best, I wish I would have found it before the project was almost complete it would have saved me a lot of work.

Here's a quick demo of how to display the topic list of a CHM file and by double-clicking the topic showing the appropriate help page. It is a quick hack to test the feasibility of a project I have in mind that uses CHM files. It is COM intensive (as are most of my projects) and not very well documented but it should be pretty easy to follow. I have included Thomas's win32 asm help file as a test file.

As always you need to use my headers to build the project, executable is included.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

donkey

#1
I was playing around with the enumeration in order to examine the substorage items in the example CHM file and thought I would upload the results. A treeview control has been added to display the results, it is filled by the enumeration recursively with the streams and storage items. For a simple CHM such as the one attached this serves to demonstrate how complex the file structure actually is, and I haven't even begun to RE the binary tree yet. Also the enumeration routine in the first example (above) is wrong, I thought the STATSTG structure was allocated by the enumerator when I was supposed to allocate it myself, it has been corrected in this version.

Next up I have to figure out how to parse the keyword list and be able to search the file, once those two are done I should be able to write a custom CHM viewer which is the ultimate goal as I want it for a module in a project and might also add it to Help2Viewer, expanding that programs functionality.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

anunitu

I really prefer CHM files for help,the only thing I don't like is as far as I understand it is you must have IE on your system to read them.BUT,I did find a program quit a while back that allowed you to read them onĀ  a Linux box..I don't remember what the program was named,but it worked very well.

Found a link about it.

http://linuxtuts.blogspot.com/2008/09/how-to-view-chm-files-in-linux.html

donkey

Hi anunitu,

Well, the only reason you are "required" to have IE on your system is for the mk:@MSITStore protocol, however it is not too difficult to bypass this protocol completely. For example in the demo I could easily get the name of the HTML stream in the file, read that stream and display it in any browser, I only chose IE (IWebBrowser2) because its easy to use and interfacing with another browser is beyond the scope of the project. Once I have figured out how to walk the $WWKeywordLinks storage object there is no further need of the mk:@MSITStore protocol since I can extract anything I want directly from the file.

I'm currently working on a stream viewer for the items in the tree, it should help me to better understand how each component of $WWKeywordLinks works and how I can use them in my application. The real pain with sparsely documented files is that most of the work goes into writing utilities to break them down into components while the final implementation is usually about a tenth the size.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

donkey

I have decoded the $WWKeywordLinks section of the file, well at least as far as I can with the CHM files I have around here. Can't seem to figure out the See Also stuff as well when a keyword opens a child window it tends to throw everything off. I've done some quick work arounds for those issues (I hope).

"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

donkey

Tried the CHM viewer on Ray's FPU lib help file and it fails when reading the keyword binary tree. There are 3 "See Also" refererences in the file and I believe that is where the problem exists so I am trying to track it down based on that assumption. Hopefully I can manage to RE the See Also entries and will get on to the index section soon, that will hopefully finally get the tree looking like it should.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable