Hard to Soft Formatting
Sometimes I have a document that uses manual line breaks or hard formatting to limit line lengths to x characters and I want to change it to a more word processor friendly format that uses paragraph breaks to delimit paragraphs, not lines.
This recent example is from a very decent walk through for the PS2 game "Hulk". It has text like this, where each line is no more than 80 characters long.
*********************** Section I: INTRODUCTION *********************** The Hulk is one of my most favorite comic book characters ever. I remember watching the 80's series when I was younger and collecting a few of his comics. But enough about that, lets get to the game. The Hulk (GC, PS2, X- box) has got to be the best Hulk game ever made! Finally we have a worthy successor to The Incredible Hulk (Genesis, SNES). The Hulk is the first game that gives you the impression that you actually are the Hulk. From smashing down doorways to making your own door, The Hulk involves a path of endless destruction from the beginning to the end. I hope this FAQ will answer any questions that you might have about The Hulk. I have tried to make the FAQ as detailed as possible. I have stayed away from anything dealing with the storyline or cutscenes to avoid spoilers, but I cannot guarantee that this FAQ is totally spoiler-free. So be sure to use the FAQ as a last resort if you are playing through for the first time.
When I paste this into MS Word, it means every line ends with a paragraph break. I want to format the document so that paragraph breaks are at the end of logical paragraphs of text only, not at the end of each line. This way, I can turn a 40 page document down to 8 pages of monospaced text.
- Copy from the web page and paste into the Word document as unformatted text.
- Apply wicked formatting: 8 pt Courier New, 66% character spacing on an A4 page with two columns 0.5 cm apart, with top, bottom, left and right margins of 1cm.
- Get rid of unnecessary spaces. Find ("
- Get rid of more unnecessary spaces before paragraph marks i.e. at the end of lines. Find using wildcards ("
[^13]
" - space then a paragraph mark) and replace with ("^p
") - a paragraph mark. - Get rid of unnecessary paragraph marks. Find ("
^p^p
" - two paragraph marks) and replace with ("^p
" - one paragraph mark). - Get rid of paragraph marks that are used only to delimit lines. The assumption is that any line ending with a letter where the next line begins with letter can be safely replaced with "
letter letter
" i.e. replace the paragraph mark with a space. Find using wildcards ("([A-Za-z])[^13]([A-Za-z])
") and replace with ("\1 \2
"). Since I can't really tell if a period at the end of a line indicates the end of a sentence or the end of a paragraph, I leave those alone.
The above text will now take up substantially less column centimetres.
This demonstrates the following.
- Word's Find and Replace can be used with regular expressions - when you use Wildcards in the Find and Replace dialog .
- To find a paragraph mark when you use wildcards, use the control code "
[^13]
" in the "Find" field. In the "Replace" field, you will still use "^p
". - Character ranges can be defined between square brackets, such as "
[A-Za-z]
". - Round braces - "
()
" - can be used to mark wild card sequences in the "Find" field and back references - "\x
" - can be used address those wild card sequence in the "Replace" field. For example, find using wildcards ("(Robert) (Bram)
") and replace with ("\2 \1
") would result in "Bram Robert
".
Some pages I found useful when working this out.
- An article on gmayor.com about "Find & replace using wildcards".
- An article on gmayor.com about "MS Office Tip: Search and replace with Word formatting".
Comments