January 28th, 2004

Word’s HTML Stinks

Lately I’ve been dealing something I haven’t had to work with before: converting Word documents to HTML, using Word’s built-in “Save as Web Page” feature. I am here to tell you that it really, really, really stinks.

You may know this already. You may not. But I’m warning you: it stinks. Here’s why.

The code it creates is straight out of 1998. We’re talking tables everywhere, font tags everywhere, odd CSS classes everywhere, and the liberal use of non-breaking spaces to align things. A document I worked with recently had thirty (!) of these in a row. That’s just silly. I also enjoy how things such as smart quotes come over as gobbledygook for anyone not running Windows. That’s a treat.

Clearly, the Save as Web Page feature isn’t meant for honest-to-goodness web designers. It’s for people who will put a page in FrontPage and call it a day. I would say, “And that’s okay,” but the code is so unnecessarily bloated that it just isn’t.

Dean Allen of Textism has a Word HTML Cleaner available, and it’s free for documents up to 20K. Bookmark this. You’ll need it.

In the meantime, for you web gurus out there, do you work with this? Or do you recommend plain text?

Posted in Technology

FROM: Ryan [E-Mail]
DATE: Wednesday January 28, 2004 -- 10:41:05 am
When I'm forced to use Save-as-HTML from Word, there are two free plug-ins I use to help. One is a Word "clean up HTML" plug-in which is accessible by itself or through the File -> Export command. Then there's another similar plug-in that I run it through afterwards that does a "final cleaning" of the file (I forget the name -- it's at work, so I'll have to check it tomorrow).

These two plug-ins will take a 30k HTML file down to about 6k, which is pretty good. Not perfect, not compliant, but decent enough.

FROM: Chris [E-Mail]
DATE: Wednesday January 28, 2004 -- 10:43:57 am
Word XP has a "save as HTML lite" feature. I don't remember exactly what it is called. It skips all that crappy proprietary XML crap that Word dumps into the HTML document. It's not great code, but it is a lot better than the default code Word creates. The HTML lite code does validate as HTML 4.0. It's not XHTML - but its better than nothing.

FROM: Zach [E-Mail]
DATE: Wednesday January 28, 2004 -- 1:15:51 pm
I tried using it once - only because a friend made me. He was trying to link documents and was having a problem. Strange thing was, using different Macs with the same OS and same version of Word, however, I could get it to work. We went through it together and it worked for me, but not for him. Two months later he's still talking to Microsoft, trying to solve the problem. Unfortunately, his client is requiring his report to be in Word.

FROM: riazahmed
DATE: Tuesday December 14, 2004 -- 4:31:41 am
FROM: Ryan [E-Mail]
DATE: Tuesday December 14, 2004 -- 9:45:06 am
Normally I'd delete a comment like that, but this one made me snicker.

Seriously, read that sentence out loud without smiling. It's impossible.

