The Omni Group Forums

The Omni Group Forums (http://forums.omnigroup.com/index.php)
-   OmniOutliner 3 for Mac (http://forums.omnigroup.com/forumdisplay.php?f=9)
-   -   Migrating old MS Word Outlines to OO (i.e., from Outline Numbered style) (http://forums.omnigroup.com/showthread.php?t=20561)

TheCPT 2011-03-30 09:51 PM

Migrating old MS Word Outlines to OO (i.e., from Outline Numbered style)
 
Yes, it can be done, even if you have them in MS Word "Outline Numbered" format rather than the nicer "Outline View". It is cumbersome, and imperfect, but beats doing by hand. I've tried both ways. :-(

[And of course, if you were smart enough to put your outlines in "Outline View" to begin with, then by all means use the ReadTree ap that is available on the board here and seems to work for many combinations of Word/OSX.]

Anyhow, here are the instructions:
[LIST=1][*]Begin with a "Numbered Outline" in MS Word, normal or page view.
[*]Remove any extraneous lines that are not part of the outline (i.e., did you insert a few lines to smooth out pagination? are there some at the beginning or end of the outline? these will turn into level one headings if you don't remove them; do this well to reduce problems in the cleanup stage; seriously, this is the source of 90 percent of my cleanup)
[*]Select the entire outline and switch from "numbered" to "bullets" with a formatting button (it is easier to block delete bullets, even of multiple formats, than numbers)
[*]Copy the entire document
[*]Paste it into an empty blank rich text email in Apple Mail
[*]Then click the "convert to plain text" button, or access it through the "Format" pull down menu. This both creates a tab delimited beginning to each line (that OO will handle well) and standardizes the bullets from the 5 or graphical formats that MS Word probably put in there
[*]Delete the bullets AND THE HARD-TO-SEE "TAB" THAT FOLLOWS THEM with search and replace function. Thus, put your curser to the left of the first heading (i.e., to the left of "Introduction" on most of mine). Hold down shift and tap the left arrow twice, first selecting the tab, then the bullet; copy all of that, open a find/replace window (command-F), paste that bullet and tab in the "find" field, leave the "replace with" field empty, then "replace all"
[INDENT]You now have your outline with a variable number of tabs in front of each level of outline entry[/INDENT][*]Then copy the body of your email.
[*]Then paste it into a new OmniOutliner document in which you are not selecting anything (otherwise, if you are editing a row, it inserts it as part of that row). "Paste with current style" if you have a style sheet that you'd like to use in your default
[/LIST]
No, not easy, but easier than importing the raw text and manually recreating the hierarchy.

Hopefully that helps at least one person. It took me forever to figure out. :-)

[And presumably the instructions above are something someone can script, but I've gotten my last set of course notes out of Word now, and hope that OPML standards are here to stay...]

wtmonroe 2011-09-06 08:07 AM

Thanks, this saved me a lot of time on a couple of projects I'm working on!

n8henrie 2013-07-30 11:50 AM

I just went through this and took a different method -- importing the tiered number list into Pages (similar to what's described above) didn't work, and I don't have Word. I did have several thousand lines of an outline that I wanted in OmniOutliner... not going to happen manually. So, I C&P'd the list into TextWrangler and used the regex find and replace function. It wasn't too tough.

The list I started with was something like:

1.0 stuff
1.1 stuff about stuff
1.1.1 stuff about stuff about stuff
1.2 different stuff about stuff
1.2.1 stuff about different stuff about stuff
2.0 different stuff
2.1 ...

The regex I started with was:

Find:
[CODE]^\d+\.[^0][/CODE]
Replace:
[CODE]\t&[/CODE]

For those less familiar with regex, [LIST][*]"^" means "the beginning of the line"[*]"\d" means any number[*]"+" means "one or more of the thing before this" (in this case any number, therefore catching double / triple digit numbers etc)[*]"\." means a period[*]"[^0]" means "anything except a 0"[*]"\t" means a tab[*]"&" means "everything matched in the pattern above"[/LIST]
Basically, this finds every line where it starts with a number, then a period, then anything but a zero, and indents it with a tab. This leaves all the bottom-level parts of the outline unindented. [B]In the example above, it would [I]not[/I] match 1.0 but it would hit on 1.1 , 1.1.1 , etc.[/B]

Then, I followed that with a second:

Find:
[CODE]^\t\d+\.\d+\.[/CODE]
Replace (same as above):
[CODE]\t&[/CODE]

Following the patterns from above, this will look for lines starting with a tab, followed by a number, followed by a period, followed by another number, followed by another period.. and indent them by one tab. [B]This would match [tab]1.1. but not 1.0 (base level) or [tab]1.1 (first indented level).[/B]

From there, I just added an extra "\t" to the front and "\d+\." to the end of the Find pattern...
Find:
[CODE]^\t\t\d+\.\d+\.\d+\.[/CODE]
Replace:
[CODE]\t&[/CODE]

and kept going until there were no more matches. Each additional "\t" on the front and "\d+\." on the end of the Find pattern matches one extra level of indentation, indenting the subpoints one level farther.

Once I was done, I just copied and pasted into OmniOutliner and the tabs produced an outline with foldable levels, just as described above. Just FYI, I grabbed the outline from a PDF with botched OCR, scrapping the bad OCR data and starting from scratch using the technique I outline [URL="http://n8henrie.com/2013/06/how-to-remove-corrupt-ocr-data-from-a-pdf/"]here[/URL]. I'm guessing that .pdf outlines may be a common reason that people would be trying to fix outline formatting for importing into OmniOutliner, since they tend to ruin formatting in my experience.


All times are GMT -8. The time now is 12:11 PM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2024, vBulletin Solutions, Inc.