The Omni Group
These forums are now read-only. Please visit our new forums to participate in discussion. A new account will be required to post in the new forums. For more info on the switch, see this post. Thank you!

Go Back   The Omni Group Forums > OmniWeb > OmniWeb Feature Requests
FAQ Members List Calendar Search Today's Posts Mark Forums Read

 
Saving complete web pages (with images) as files, not archive Thread Tools Search this Thread Display Modes
Please add the possibility to save the complete page (with images) as separate files (html file + images etc., maybe with a subfolder) like Firefox, Opera and others do it (by adjusting the HTML code that points to the images etc.). No PDF, no web archive, just something that's compatible with every browser.

See the thread http://forums.omnigroup.com/showthread.php?t=2275.

Tim B.
 
I just tried it in Firefox, it doesn't work. If the page is very simple, I'm sure it probably works. I rarely says absolutes, but I would say it's impossible for such a feature to be reliable.
 
Why should it be so difficult? You only have to recode all tags that point to stuff used in the page so that they reflect the relative path to the file on the hd. And, in order to be sure that links will still work, recode all links that point away from the page from relative to absolute. That's all—or am I missing something?

The only difficulties I can imagine are to decide what to call the html file if the source was dynamically generated, links might not work anymore if they contained session information, and problems with flash or other content that will run in the browser that might load other stuff when it's started.

But these are problems relevant to the generation of .webarchives, too.

Remember, we're talking about a single web page here, not about some part of the file tree of an entire website.

wget (commandline websucking tool) has been able to do this since I used it for the first time, which was more than 5 years ago.
 
Quote:
Originally Posted by zottel
Why should it be so difficult? You only have to recode all tags that point to stuff used in the page so that they reflect the relative path to the file on the hd. And, in order to be sure that links will still work, recode all links that point away from the page from relative to absolute. That's all—or am I missing something?
That's all? Sure, but that's a lot to ask.

Quote:
The only difficulties I can imagine are to decide what to call the html file if the source was dynamically generated, links might not work anymore if they contained session information, and problems with flash or other content that will run in the browser that might load other stuff when it's started.
And a lot of sites and certainly most of the popular ones will all suffer from those problems. If the site has Flash and it references any files or links to any files, you can pretty much bet it will break. A lot of JavaScript and CSS also breaks in my tests saving from Firefox 2 and IE7. This typically results in the page being badly broken.

Quote:
But these are problems relevant to the generation of .webarchives, too.
In my tests, that's not the case. Sites that broke when saving as completed HTML from IE7 and FF2 did not break when saved as a webarchive.

Quote:
Remember, we're talking about a single web page here, not about some part of the file tree of an entire website.
I must be completely missing what you're trying to say with that. Saving the source would be talking about a single web page, but saving it as a "complete" page is most certainly trying to save a part of the file tree of an entire site.

Quote:
wget (commandline websucking tool) has been able to do this since I used it for the first time, which was more than 5 years ago.
I haven't used that, but I would be seriously surprised if it didn't suffer from the same issues that IE7 and FF2 do.
 
I can't say that I had many problems with saving pages that way (but, well, this has been on Windows for some years now). Some occassional glitches (very rare), but one browser or the other would always save that specific page completely. The only thing slightly damaged might have been the layout of the page, but I can live with that. I religiously keep my notes in plain-text files and my huge archive of web pages in highly compatible single html files together with their adjacent files.

And cf. archives: There's not so much difference between a folder structure and the internal structure of an archive, or am I wrong with that?

TB
 
This has probably worked well in the past, but as newer techniques get used with sites, it's going to become less and less reliable.
 
That's all—yes, I don't think that's so much. Not much more than 30 lines in perl, I'd guess. (More than a bit more than that in C, of course, but compared to the complexity of a whole browser that's peanuts, IMHO.)

But, OTOH—yes, I haven't used wget for years, and at that time even CSS wasn't very widespread. I don't know if wget is able to handle Javascript or CSS stuff. And no way it can handle flash. ;-)

And about .webarchives—it actually depends on how they're working. Maybe they really put the browser into the state it was in when you actually viewed the page, so relative links would still point to the correct destination without having to be recoded. That would really make things easier, of course. If that format is some kind of file version of the internal model the browser uses, nothing would have to be changed. If not, at least the tags pointing to the stuff used in the page would have to be recoded in some way. As good as it's working, though, I guess it's really more some kind of saving the browser state than saving recoded pages.

Regarding the file tree: I thought you were maybe thinking of something like sucking down a whole forum for offline viewing, which is much more difficult, of course—deciding what depth to use to be sure that everything you need is there etc.

But I agree that with all the new techniques it has become quite difficult to save a representation of what you're viewing as some kind of a source tree. As long as there's no Javascript or other dynamic stuff included, it's not really a big problem. It's probably impossible with AJAX stuff, though.

Last edited by zottel; 2006-11-30 at 04:16 PM..
 
So I'm curious what the reason is why the results need to be in HTML/CSS... rather than, for example, a PDF. The only difference (I can think of) is the ability to copy or edit the code.
 
Quote:
Originally Posted by timb
And cf. archives: There's not so much difference between a folder structure and the internal structure of an archive, or am I wrong with that?
As I said in the posting above, I guess that .webarchives are in fact some representation of the internal model of the browser. That means that when a .webarchive is loaded, the browser will be put into exactly the same state it was in when you were actually viewing the page. This way, several problems can be avoided. Above all, the browser is practically in the same server directory. So all relative links, be it in images or links or Javascripts or Flash animations, will still point to the correct destination without changing anything. Additionally, any dynamic content, even if it's ajaxly dynamic, ;-) will still have just the same representation as it had when you were actually viewing the page. It would be extremely difficult, if not impossible, to get this by translating that stuff to actual files and still be able to interact with it when you view it again (like moving a map on maps.google.com).

Edit: Interactivity will also be broken with .webarchives, if the page has changed meanwhile, of course. If Google decides to use some other Javascript model for moving maps, your old .webarchive will still show the same as before, but you won't be able to move the map anymore.

Last edited by zottel; 2006-11-30 at 05:07 PM..
 
Quote:
Originally Posted by Forrest
So I'm curious what the reason is why the results need to be in HTML/CSS... rather than, for example, a PDF. The only difference (I can think of) is the ability to copy or edit the code.
Well, .webarchives do only work with Webkit browsers. PDFs don't give you the possibility to interact with the page, e.g. follow links (although this would be possible, but I doubt it's implemented that way (never tried)).

If a html/css version was possible for any page you viewed, you could open your archives in any browser whatsoever.

I guess that's what's behind that request.
 
 


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Saving images results in php file Handycam OmniWeb Bug Reports 2 2009-08-05 04:55 AM
Saving files katherine OmniFocus 1 for Mac 1 2008-08-31 12:02 PM
Remember last selected folder while saving web pages Tiggar OmniWeb General 1 2007-12-14 09:27 AM
Saving complete web pages (with images)? timb OmniWeb General 12 2006-11-29 10:41 AM
Local pages, source editor/viewer, and saving joragan OmniWeb Bug Reports 2 2006-04-26 08:30 AM


All times are GMT -8. The time now is 05:54 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2024, vBulletin Solutions, Inc.