<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Welcome to My World &#187; remove ms word styles from pasted content</title>
	<atom:link href="http://www.eanbowman.com/blog/tag/remove-ms-word-styles-from-pasted-content/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.eanbowman.com/blog</link>
	<description>The Weblog of Ean Bowman</description>
	<lastBuildDate>Wed, 07 Jul 2010 13:36:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Remove MS Word Styles from Pasted Content</title>
		<link>http://www.eanbowman.com/blog/2009/01/29/remove-ms-word-styles-from-pasted-content/</link>
		<comments>http://www.eanbowman.com/blog/2009/01/29/remove-ms-word-styles-from-pasted-content/#comments</comments>
		<pubDate>Thu, 29 Jan 2009 19:02:25 +0000</pubDate>
		<dc:creator>Ean</dc:creator>
				<category><![CDATA[Daily Musings]]></category>
		<category><![CDATA[remove ms word styles from pasted content]]></category>

		<guid isPermaLink="false">http://www.eanbowman.com/blog/2009/01/29/remove-ms-word-styles-from-pasted-content/</guid>
		<description><![CDATA[While creating WYSIWYG editor fields for CMS engines I&#8217;ve often had the issue of clients pasting in files from Microsoft Word which somehow applies all kinds of unwanted formatting that either just carries over the ugliness of their original document or screws up the web layout and semantic correctness completely. I&#8217;ve come up with this [...]]]></description>
			<content:encoded><![CDATA[<p>While creating WYSIWYG editor fields for CMS engines I&#8217;ve often had the issue of clients pasting in files from Microsoft Word which somehow applies all kinds of unwanted formatting that either just carries over the ugliness of their original document or screws up the web layout and semantic correctness completely.</p>
<p>I&#8217;ve come up with this function to remove extra formatting from HTML WYSIWYG editor input such as TinyMCE.</p>
<pre class="code" style="display: block; width: 100%; height: 250px; overflow: scroll;">
	/**
	* Remove HTML tags, including invisible text such as style and
	* script code, and embedded objects.  Add spaces around
	* block-level tags to prevent word joining after tag removal.
	*/
	function strip_html_tags( $text )
	{
	$text = preg_replace(
	array(
	// Remove invisible content
	'@&lt;head[^&gt;]*?&gt;.*?&lt;/head&gt;@siu',
	'@&lt;style[^&gt;]*?&gt;.*?&lt;/style&gt;@siu',
	'@&lt;script[^&gt;]*?.*?&lt;/script&gt;@siu',
	'@&lt;object[^&gt;]*?.*?&lt;/object&gt;@siu',
	'@&lt;embed[^&gt;]*?.*?&lt;/embed&gt;@siu',
	'@&lt;applet[^&gt;]*?.*?&lt;/applet&gt;@siu',
	'@&lt;noframes[^&gt;]*?.*?&lt;/noframes&gt;@siu',
	'@&lt;noscript[^&gt;]*?.*?&lt;/noscript&gt;@siu',
	'@&lt;noembed[^&gt;]*?.*?&lt;/noembed&gt;@siu',
	'/class=(.*)Mso(.*)&quot;/',
	'/class=(.*)mso(.*)&quot;/',
	'/style=(.*)&quot;/',
	'/&lt;!--(.*)--&gt;/',
	),
	array(
	' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '', '', '', '', ''
	),
	$text );
	$text = str_replace( &quot;&amp;lt;!--&quot;, &quot;&lt;!--&quot;, $text );
	$text = str_replace( &quot;--&amp;gt;&quot;, &quot;--&gt;&quot;, $text );
	$text = str_replace( &quot;&lt;style&gt;&quot;, &quot;&quot;, $text );
	$text = str_replace( &quot;&lt;/style&gt;&quot;, &quot;&quot;, $text );

	return strip_tags( $text, '&lt;address&gt;&lt;blockquote&gt;&lt;del&gt;&lt;div&gt;&lt;h1&gt;&lt;h2&gt;&lt;h3&gt;&lt;h4&gt;&lt;h5&gt;&lt;h6&gt;&lt;ins&gt;&lt;p&gt;&lt;a&gt;&lt;b&gt;&lt;i&gt;&lt;u&gt;&lt;img&gt;&lt;pre&gt;&lt;dl&gt;&lt;dt&gt;&lt;dd&gt;&lt;li&gt;&lt;ol&gt;&lt;ul&gt;&lt;table&gt;&lt;tr&gt;&lt;th&gt;&lt;td&gt;&lt;caption&gt;&lt;abbr&gt;&lt;acronym&gt;&lt;span&gt;&lt;strong&gt;&lt;em&gt;' );
	} // end strip_html_tags
</pre>
<p>Do you guys have any ideas?</p>
<p>P.S. I had a hell of a time trying to paste this into WordPress even. I guess something might need to be done there.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.eanbowman.com/blog/2009/01/29/remove-ms-word-styles-from-pasted-content/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
