Welcome to My World

January 29th, 2009

Remove MS Word Styles from Pasted Content

Filed under: Daily Musings — Tags: — Ean @ 3:02 pm

While creating WYSIWYG editor fields for CMS engines I’ve often had the issue of clients pasting in files from Microsoft Word which somehow applies all kinds of unwanted formatting that either just carries over the ugliness of their original document or screws up the web layout and semantic correctness completely.

I’ve come up with this function to remove extra formatting from HTML WYSIWYG editor input such as TinyMCE.

	/**
	* Remove HTML tags, including invisible text such as style and
	* script code, and embedded objects.  Add spaces around
	* block-level tags to prevent word joining after tag removal.
	*/
	function strip_html_tags( $text )
	{
	$text = preg_replace(
	array(
	// Remove invisible content
	'@<head[^>]*?>.*?</head>@siu',
	'@<style[^>]*?>.*?</style>@siu',
	'@<script[^>]*?.*?</script>@siu',
	'@<object[^>]*?.*?</object>@siu',
	'@<embed[^>]*?.*?</embed>@siu',
	'@<applet[^>]*?.*?</applet>@siu',
	'@<noframes[^>]*?.*?</noframes>@siu',
	'@<noscript[^>]*?.*?</noscript>@siu',
	'@<noembed[^>]*?.*?</noembed>@siu',
	'/class=(.*)Mso(.*)"/',
	'/class=(.*)mso(.*)"/',
	'/style=(.*)"/',
	'/<!--(.*)-->/',
	),
	array(
	' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '', '', '', '', ''
	),
	$text );
	$text = str_replace( "&lt;!--", "<!--", $text );
	$text = str_replace( "--&gt;", "-->", $text );
	$text = str_replace( "<style>", "", $text );
	$text = str_replace( "</style>", "", $text );

	return strip_tags( $text, '<address><blockquote><del><div><h1><h2><h3><h4><h5><h6><ins><p><a><b><i><u><img><pre><dl><dt><dd><li><ol><ul><table><tr><th><td><caption><abbr><acronym><span><strong><em>' );
	} // end strip_html_tags

Do you guys have any ideas?

P.S. I had a hell of a time trying to paste this into Wordpress even. I guess something might need to be done there.

January 6th, 2009

Resolutions

Filed under: Daily Musings — Tags: — Ean @ 10:17 am
  1. Draw something every day;
  2. Play my guitar when I am home each day;
  3. Post a video of myself playing guitar for all to see, and ridicule, by the end of the year.

Powered by WordPress