Saving complete web pages (with images) as files, not archive

timb · timb

Please add the possibility to save the complete page (with images) as separate files (html file + images etc., maybe with a subfolder) like Firefox, Opera and others do it (by adjusting the HTML code that points to the images etc.). No PDF, no web archive, just something that's compatible with every browser.

See the thread http://forums.omnigroup.com/showthread.php?t=2275.

Tim B.

Forrest · Forrest

I just tried it in Firefox, it doesn't work. If the page is very simple, I'm sure it probably works. I rarely says absolutes, but I would say it's impossible for such a feature to be reliable.

zottel · zottel

Why should it be so difficult? You only have to recode all tags that point to stuff used in the page so that they reflect the relative path to the file on the hd. And, in order to be sure that links will still work, recode all links that point away from the page from relative to absolute. That's all—or am I missing something?

The only difficulties I can imagine are to decide what to call the html file if the source was dynamically generated, links might not work anymore if they contained session information, and problems with flash or other content that will run in the browser that might load other stuff when it's started.

But these are problems relevant to the generation of .webarchives, too.

Remember, we're talking about a single web page here, not about some part of the file tree of an entire website.

wget (commandline websucking tool) has been able to do this since I used it for the first time, which was more than 5 years ago.

Forrest · Forrest

Quote:

Originally Posted by zottel

Why should it be so difficult? You only have to recode all tags that point to stuff used in the page so that they reflect the relative path to the file on the hd. And, in order to be sure that links will still work, recode all links that point away from the page from relative to absolute. That's all—or am I missing something?

That's all? Sure, but that's a lot to ask.

Quote:

The only difficulties I can imagine are to decide what to call the html file if the source was dynamically generated, links might not work anymore if they contained session information, and problems with flash or other content that will run in the browser that might load other stuff when it's started.

And a lot of sites and certainly most of the popular ones will all suffer from those problems. If the site has Flash and it references any files or links to any files, you can pretty much bet it will break. A lot of JavaScript and CSS also breaks in my tests saving from Firefox 2 and IE7. This typically results in the page being badly broken.

Quote:

But these are problems relevant to the generation of .webarchives, too.

In my tests, that's not the case. Sites that broke when saving as completed HTML from IE7 and FF2 did not break when saved as a webarchive.

Quote:

Remember, we're talking about a single web page here, not about some part of the file tree of an entire website.

I must be completely missing what you're trying to say with that. Saving the source would be talking about a single web page, but saving it as a "complete" page is most certainly trying to save a part of the file tree of an entire site.

Quote:

wget (commandline websucking tool) has been able to do this since I used it for the first time, which was more than 5 years ago.

I haven't used that, but I would be seriously surprised if it didn't suffer from the same issues that IE7 and FF2 do.

timb · timb

I can't say that I had many problems with saving pages that way (but, well, this has been on Windows for some years now). Some occassional glitches (very rare), but one browser or the other would always save that specific page completely. The only thing slightly damaged might have been the layout of the page, but I can live with that. I religiously keep my notes in plain-text files and my huge archive of web pages in highly compatible single html files together with their adjacent files.

And cf. archives: There's not so much difference between a folder structure and the internal structure of an archive, or am I wrong with that?

TB

Forrest · Forrest

This has probably worked well in the past, but as newer techniques get used with sites, it's going to become less and less reliable.

zottel · zottel

Quote:

Originally Posted by timb

And cf. archives: There's not so much difference between a folder structure and the internal structure of an archive, or am I wrong with that?

As I said in the posting above, I guess that .webarchives are in fact some representation of the internal model of the browser. That means that when a .webarchive is loaded, the browser will be put into exactly the same state it was in when you were actually viewing the page. This way, several problems can be avoided. Above all, the browser is practically in the same server directory. So all relative links, be it in images or links or Javascripts or Flash animations, will still point to the correct destination without changing anything. Additionally, any dynamic content, even if it's ajaxly dynamic, ;-) will still have just the same representation as it had when you were actually viewing the page. It would be extremely difficult, if not impossible, to get this by translating that stuff to actual files and still be able to interact with it when you view it again (like moving a map on maps.google.com).

Edit: Interactivity will also be broken with .webarchives, if the page has changed meanwhile, of course. If Google decides to use some other Javascript model for moving maps, your old .webarchive will still show the same as before, but you won't be able to move the map anymore.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Saving images results in php file	Handycam	OmniWeb Bug Reports	2	2009-08-05 04:55 AM
Saving files	katherine	OmniFocus 1 for Mac	1	2008-08-31 12:02 PM
Remember last selected folder while saving web pages	Tiggar	OmniWeb General	1	2007-12-14 09:27 AM
Saving complete web pages (with images)?	timb	OmniWeb General	12	2006-11-29 10:41 AM
Local pages, source editor/viewer, and saving	joragan	OmniWeb Bug Reports	2	2006-04-26 08:30 AM