The Omni Group Forums

The Omni Group Forums (http://forums.omnigroup.com/index.php)
-   OmniWeb Feature Requests (http://forums.omnigroup.com/forumdisplay.php?f=28)
-   -   Saving complete web pages (with images) as files, not archive (http://forums.omnigroup.com/showthread.php?t=2278)

zottel 2006-11-30 05:14 PM

... which brings me to another question:

Does anyone know more about .webarchives and how that format is actually defined? If it's really a representation of the internal browser model—will it work with future versions, where this model might change?

Forrest 2006-11-30 10:38 PM

I gotcha. I did some searching for more info on Webarchives, and I did find one app that will extract files from a webarchive. Not sure how well it works. [url]http://www.macupdate.com/info.php/id/20643[/url]

Len Case 2006-12-01 01:00 AM

A webarchive is a serialized form of the record of responses used to create a webpage.

Basically, as each resource is requested (via a image tag, a subframe, javascript, or even a flash plugin request) when the response comes back from the server, the request-response pair is stored in an object which can be serialized as a data file (webarchive). Then if the webarchive is loaded, as each resource reloads, if the same requests are made, instead of going to the server, the data is loaded from the archive instead.

It doesn't actually store all the state of javascript or plugins (hard to do in the first case, and not part of the api for the second).

Len Case 2006-12-01 01:06 AM

[QUOTE=zottel]... which brings me to another question:

Does anyone know more about .webarchives and how that format is actually defined? If it's really a representation of the internal browser model—will it work with future versions, where this model might change?[/QUOTE]
Since all of WebKit is now open source, you can look at the code for yourself and see exactly how webarchives are defined and created--and since they are including the full history in the public repository, you should always be able to read or write any version of webarchive were they to change them in the future.

timb 2006-12-01 06:37 PM

Well, I'm back and...

...not only did I cough up the 9,95 for the November-sale OmniWeb, albeit I don't even have a Mac to run the latest version (I have a sweet spot for this browser, dunno why)...
[QUOTE=Forrest]I just tried it in Firefox, it doesn't work. If the page is very simple, I'm sure it probably works. I rarely says absolutes, but I would say it's impossible for such a feature to be reliable.[/QUOTE]
... I did also dust off my b/w G3 (running Jaguar, which is why I can't run OW 5.5) and saved more than half a dozen web pages (w/images) in Firefox (0.9!), transferred the folders and files to a Windoze machine and looked at them (while offline) in IE (6) and others. All but one (my Gmail inbox, I didn't seriously expect that to save correctly) showed up with the content intact. This included the Omnigroup homepage and the OmniWeb features page. The page layout sometimes wasn't reproduced like the original, but I don't care for that. And I know that Opera would have saved it even better.

The main reason for the request was cross-platform and future accessibiltiy. It's the same reason why I prefer to keep my notes in plain-text. Web archives are a joke. PDFs are something completely different than the page itself. I'm used to dig into the source code of pages I've saved and add remarks or do other adjustments (I somtimes even run search-replace operations, to correct errors; this is not about all-English pages, after all). I can't do that with PDFs. And if I try to save/print as an A4 PDF, more often than not the margins of the page will be cut off. I had actually [I]started[/I] with trying to archive everything as PDF, but soon abandoned that way.
So I want to uphold my request: Please add a feature that saves web pages as individual files together with their adjacent images etc.

T.

DanielSmith 2007-03-10 10:21 PM

I know a easy way to do that ,LOL
just save the entire page into a image.
using the system Print Screen Key is not a good idea for it can only record the screen , i am using [URL="http://www.acasystems.com/en/screencapturepro/"]ACA Capture [/URL],it can also capture the other part of the webpage outside the screen.
But if the webpage is saved into image ,it can't be split anymore.

timb 2007-03-25 05:04 PM

DanielSmith, in which way would that be better than to save as PDF?

AFAICT, saving as PDFs would do this just as well, but the major points in this thread were that: [LIST][*]web archives are proprietary to WebKit browsers and not cross-platform (there aren't any non-Mac WebKit browsers) [*]while PDFs are kind of cross-platform, [LIST][*]they don't preserve links in the pages [*]I like to edit the source code of saved pages (add comments or correct errors, even edit links to additional images etc.) [*]some web pages don't print well at all, some PDF "printouts" have their margins cut off some text etc. etc.[/LIST][/LIST]

JKT 2007-03-26 10:48 AM

(timb, this won't help your needs, but I'm posting it as a FYI).

If you Save as PDF (hold option down as you do a Save as...), rather than printing as PDF, they should save with the formatting preserved - note, this method generates a single page PDF file of the site, so if it exceeds the boundary of a single page of paper in the print version, this won't occur. It is useful for sites that need a lot of vertical scrolling to view, if you don't want them to be split at inconvenient locations in the text. However, it isn't so useful if you do actually want a hard copy printout.

I'm hoping Apple will allow the links to remain live in their PDFs in the next version of OS X.

Chiller 2007-03-31 06:11 PM

An AppleScript perhaps?
 
An ambitous (and knowledgeable) person could write an AppleScript doing that provided that OmniWeb could list all of the resources on a web page like Firefox does. Then save all of those files into an Archive (Apple zip file).


All times are GMT -8. The time now is 07:13 AM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2024, vBulletin Solutions, Inc.