The Daily Ping

The 5,000th Ping will be published on September 14, 2013.

January 28th, 2004

Word’s HTML Stinks

Lately I’ve been dealing something I haven’t had to work with before: converting Word documents to HTML, using Word’s built-in “Save as Web Page” feature. I am here to tell you that it really, really, really stinks.

You may know this already. You may not. But I’m warning you: it stinks. Here’s why.

The code it creates is straight out of 1998. We’re talking tables everywhere, font tags everywhere, odd CSS classes everywhere, and the liberal use of non-breaking spaces to align things. A document I worked with recently had thirty (!) of these in a row. That’s just silly. I also enjoy how things such as smart quotes come over as gobbledygook for anyone not running Windows. That’s a treat.

Clearly, the Save as Web Page feature isn’t meant for honest-to-goodness web designers. It’s for people who will put a page in FrontPage and call it a day. I would say, “And that’s okay,” but the code is so unnecessarily bloated that it just isn’t.

Dean Allen of Textism has a Word HTML Cleaner available, and it’s free for documents up to 20K. Bookmark this. You’ll need it.

In the meantime, for you web gurus out there, do you work with this? Or do you recommend plain text?

Posted in Technology

FROM: Ryan [E-Mail]
DATE: Wednesday January 28, 2004 -- 10:41:05 am
When I'm forced to use Save-as-HTML from Word, there are two free plug-ins I use to help. One is a Word "clean up HTML" plug-in which is accessible by itself or through the File -> Export command. Then there's another similar plug-in that I run it through afterwards that does a "final cleaning" of the file (I forget the name -- it's at work, so I'll have to check it tomorrow).

These two plug-ins will take a 30k HTML file down to about 6k, which is pretty good. Not perfect, not compliant, but decent enough.

FROM: Chris [E-Mail]
DATE: Wednesday January 28, 2004 -- 10:43:57 am
Word XP has a "save as HTML lite" feature. I don't remember exactly what it is called. It skips all that crappy proprietary XML crap that Word dumps into the HTML document. It's not great code, but it is a lot better than the default code Word creates. The HTML lite code does validate as HTML 4.0. It's not XHTML - but its better than nothing.

FROM: Zach [E-Mail]
DATE: Wednesday January 28, 2004 -- 1:15:51 pm
I tried using it once - only because a friend made me. He was trying to link documents and was having a problem. Strange thing was, using different Macs with the same OS and same version of Word, however, I could get it to work. We went through it together and it worked for me, but not for him. Two months later he's still talking to Microsoft, trying to solve the problem. Unfortunately, his client is requiring his report to be in Word.

FROM: riazahmed
DATE: Tuesday December 14, 2004 -- 4:31:41 am
me is sexy big xxx photo

FROM: Ryan [E-Mail]
DATE: Tuesday December 14, 2004 -- 9:45:06 am
Normally I'd delete a comment like that, but this one made me snicker.

Seriously, read that sentence out loud without smiling. It's impossible.

What is this then?

The Daily Ping is the web's finest compendium of toilet information and Oreo™® research. Too much? Okay, okay, it's a daily opinion column written by two friends. Did we mention we've been doing this for over ten years? Tell me more!

Most Popular Pings

Last Week's Most Popular Pings

Let's be nice.

© 2000-2011 The Daily Ping, all rights reserved. Tilted sidebar note idea 'adapted' from Panic. Powered by the mighty WordPress.