The Omni Group Forums - View Single Post - Saving complete web pages (with images) as files, not archive

Forrest · Forrest

Quote:

Originally Posted by zottel

Why should it be so difficult? You only have to recode all tags that point to stuff used in the page so that they reflect the relative path to the file on the hd. And, in order to be sure that links will still work, recode all links that point away from the page from relative to absolute. That's all—or am I missing something?

That's all? Sure, but that's a lot to ask.

Quote:

The only difficulties I can imagine are to decide what to call the html file if the source was dynamically generated, links might not work anymore if they contained session information, and problems with flash or other content that will run in the browser that might load other stuff when it's started.

And a lot of sites and certainly most of the popular ones will all suffer from those problems. If the site has Flash and it references any files or links to any files, you can pretty much bet it will break. A lot of JavaScript and CSS also breaks in my tests saving from Firefox 2 and IE7. This typically results in the page being badly broken.

Quote:

But these are problems relevant to the generation of .webarchives, too.

In my tests, that's not the case. Sites that broke when saving as completed HTML from IE7 and FF2 did not break when saved as a webarchive.

Quote:

Remember, we're talking about a single web page here, not about some part of the file tree of an entire website.

I must be completely missing what you're trying to say with that. Saving the source would be talking about a single web page, but saving it as a "complete" page is most certainly trying to save a part of the file tree of an entire site.

Quote:

wget (commandline websucking tool) has been able to do this since I used it for the first time, which was more than 5 years ago.

I haven't used that, but I would be seriously surprised if it didn't suffer from the same issues that IE7 and FF2 do.