The Omni Group Forums - View Single Post - Saving complete web pages (with images) as files, not archive

zottel · zottel

That's all—yes, I don't think that's so much. Not much more than 30 lines in perl, I'd guess. (More than a bit more than that in C, of course, but compared to the complexity of a whole browser that's peanuts, IMHO.)

But, OTOH—yes, I haven't used wget for years, and at that time even CSS wasn't very widespread. I don't know if wget is able to handle Javascript or CSS stuff. And no way it can handle flash. ;-)

And about .webarchives—it actually depends on how they're working. Maybe they really put the browser into the state it was in when you actually viewed the page, so relative links would still point to the correct destination without having to be recoded. That would really make things easier, of course. If that format is some kind of file version of the internal model the browser uses, nothing would have to be changed. If not, at least the tags pointing to the stuff used in the page would have to be recoded in some way. As good as it's working, though, I guess it's really more some kind of saving the browser state than saving recoded pages.

Regarding the file tree: I thought you were maybe thinking of something like sucking down a whole forum for offline viewing, which is much more difficult, of course—deciding what depth to use to be sure that everything you need is there etc.

But I agree that with all the new techniques it has become quite difficult to save a representation of what you're viewing as some kind of a source tree. As long as there's no Javascript or other dynamic stuff included, it's not really a big problem. It's probably impossible with AJAX stuff, though.