Replacing quotes with regex - php

I have a set of page soure code that has elements such as
<p class='style1'>, <p class=3DMsoNormal>, <span style=3D'font-size:12.0pt'>, <p class=3DMsoNormal>
but i want am trying to replace the all single quotes with double quotes in the all of the source code and those that dont have quotes to receive double quotes such as
<p class=3DMsoNormal and remove the text '3D' from all those that have it.
Below are a series of functions i tried that didn't work. Can someone help me find a solution to this? Thanks
<?php
// test files holds the source code
$html_part = file_get_contents('testRegex.html');
$cSeq = "/(.*)='(.*)'/"; //code sequence
$nSeq = "/(.*)="."(.*)"."/"; //new sequence
preg_match_all($cSeq, $html_part, $matches);
preg_replace($cSeq, $nSeq, $html_part);
echo $html_part;
?>

I'm not sure this regex's are the way to go.
Maybe consider using a parser to read in the file, and write it back out / prettify it.
I've used Beautiful Soup in the past.

preg_replace("/(.*)?='(.*)?'/","\\1=\"\\2\"",$str)
you need to use backreference http://www.regular-expressions.info/brackets.html

You might want to take a look at quoted_printable_decode() instead of removing the '3D' manually.

Related

Php regex to conditionally replace first occurance of string

I need to do some cleanup on strings that look like this:
$author_name = '<a href="http://en.wikipedia.org/wiki/Robert_Jones_Burdette>Robert Jones Burdette </a>';
Notice the href tag doesn't have closing quotes - I'm using the DOMParser on a large table of these to extract the text, and it borks on this.
I would like to look at the string in $author_name;
IF the first > does NOT have a " before it, replace it with "> to close the tag correctly. If it is okay, just skip and do the next step. Be sure not to replace the second > at all.
Using php regex, I haven't been able to find a working solution - I could chop up the whole thing and check its parts, but that would be slow and I think there must be a regex that can do what I want.
TIA
What you can do is, find the first closing tag, with or without the double-quote ("), and replace it with (">):
$author_name = preg_replace('/(.+?)"?>(.+?)/', '$1">$2', $author_name);
http://www.barattalo.it/html-fixer/
Download that, then include it in your php.
The rest is quite easy:
$dirty_html = ".....bad html here......";
$a = new HtmlFixer();
$clean_html = $a->getFixedHtml($dirty_html);
It's common for people to want to use regular expressions, but you must remember that HTML is not regular.

Escape quote or special characters in array value

In my PHP code, I'm setting up an area for people to enter their own info to be displayed. The info is stored in an array and I want to make it as flexible as possible.
If I have something like...
$myArray[]['Text'] = 'Don't want this to fail';
or
$myArray[]['Text'] = "This has to be "easy" to do";
How would I go about escaping the apostrophe or quote within the array value?
Thanks
Edit: Since there is only a one to one relationship, I changed my array to this structure...
$linksArray['Link Name'] ='/path/to/link';
$linksArray['Link Name2'] ='/path/to/link2';
$linksArray['Link Name2'] ='/path/to/link3';
The plan is I set up a template with an include file that has these links in a format someone else (a less technical person) can maintain. They will have direct access to the PHP and I'm afraid they may put a single or double quote in the "link name" area and break the system.
Thanks again.
POSSIBLE SOLUTION:
Thanks #Tim Cooper.
Here's a sample that worked for me...
$link = "http://www.google.com";
$text = <<<TEXT
Don't you loving "googling" things
TEXT;
$linksArray[$text] = $link;
Using a heredoc might be a good solution:
$myArray[]['Text'] = <<<TEXT
Place text here without escaping " or '
TEXT;
PHP will process these strings properly upon input.
If you are constructing the strings yourself as you have shown, you can alternate between quotation styles (single and double)...as in:
$myArray[]['Text'] = "Don't want this to fail";
$myArray[]['Text'] = 'This has to be "easy" to do';
Or, if you must escape the characters, you use the \ character before the quotation.
$myArray[]['Text'] = 'Don\'t want this to fail';
$myArray[]['Text'] = "This has to be \"easy\" to do";
If you really want to make i easy, use a separate configuration file in either INI or XML style. INI is usually the easiest for people to edit manually. XML is good if you have a really nested structure.
Unless you are letting users enter direct PHP code (you probably aren't), you don't have to worry about what they enter until you go to display it. When you actually display the info they enter, you will want to sanitize it using something like htmlentities().
Edit: I realize I may be misunderstanding your question. If so, ignore this! :)
You can use the addslashes($str) function to automatically escape quotes.
You can also try htmlentities, which will encode quotes and other special values into HTML entities: http://php.net/manual/en/function.htmlentities.php

How to add an HTML class to a PHP script

First off, I don't know much (quite nothing) about PHP. I'm more familiar with CSS.
I'm making use of Ben Ward script Tumblr2Wordpress (here's the script on GitHub) to export my Tumblr blog in XML (so I can import it in my Wordpress blog). This script reads tumblr's API, queries elements, do a bit of formatting and export the whole thing in HTML.
I need to customize it just a bit to fit my needs. For example in the following function I need the blockquote to become a specific class of blockquote:
function _doBlockQuotes_callback($matches) {
$bq = $matches[1];
# trim one level of quoting - trim whitespace-only lines
$bq = preg_replace('/^[ ]*>[ ]?|^[ ]+$/m', '', $bq);
$bq = $this->runBlockGamut($bq); # recurse
$bq = preg_replace('/^/m', " ", $bq);
# These leading spaces cause problem with <pre> content,
# so we need to fix that:
$bq = preg_replace_callback('{(\s*<pre>.+?</pre>)}sx', array(&$this, '_doBlockQuotes_callback2'), $bq);
return "\n". $this->hashBlock("<blockquote>\n$bq\n</blockquote>")."\n\n";
}
At first, I thought it will be as simple as adding the class I need inside the blockquote HTML tag, like so <blockquote class="big"> But it breaks the code.
Is there a way I could add this HTML attribute as is in the PHP script? Or do I need to define the output of this <blockquote>somewhere else?
Thanks in advance for any tips!
P.
Your guess was correct, but you need to escape the quotes with backslashes:
return "\n". $this->hashBlock("<blockquote class=\"big\">\n$bq\n</blockquote>")."\n\n";
Otherwise, PHP assumes that your string ends at the class=" quote.
You can escape double quotes ".
"<blockquote class=\"big\">"
How ever, if you're going to use single quotes '. It's unnecessary.
'<blockquote class="big">'
You need to escape the quote marks
<blockquote class=\"big\">

What is the correct regex (for PHP preg_replace) to remove empty paragraph ( <p> ) tags?

I'm working in Wordpress and need to be able to remove images and empty paragraphs. So far, I've found out how to remove images without a problem. But, I then need to remove empty paragraph tags. I'm using PHP preg_replace to handle the regex functions.
So, as an example, I have the string:
<p style="text-align:center;"><img src="http://www.blah.com/image.jpg" alt="Blah Image" /></p><p>Some text</p>
I run this regex on it:
/<img.*?(>)/
And I end up with this string:
<p style="text-align:center;"></p><p>Some text</p>
I then need to be able to remove the empty paragraph. I tried this, but it removes all paragraphs and the contents of the paragraphs:
/<p[^>]*><\/p[^>]*>/
Any help/suggestions is greatly appreciated!
The correct regex is no regex. Use an HTML/DOM Parser instead. They're simple to use. Regex is for regular languages (which HTML is not).
/<p[^>]*><\/p[^>]*>/ (the regex you gave) should work fine. If it's giving you trouble you could try double-escaping the / like this: /<p[^>]*><\\/p[^>]*>/
PHP is funny about quoting and escape characters. For example "\n" is not equal to '\n'. The first is a line break, the second is a literal backslash followed by an 'n'. The PHP manual entry on string literals is probably worth a quick look.

What is the right way to create tabs and line breaks in PHP when using single quotes?

Seems like a simple question, but I haven't been able to find a solid answer anywhere. I'm outputting a ton of HTML and find escaping "s to be error prone and hard to read, but I also want to have my HTML formatted nicely.
Want something like this (though I know this won't worK):
echo '<div id="test">\n';
echo '\t<div id="test-sub">\n';
echo '\t</div>\n';
echo '</div>\n';
What is one to do?
Thanks.
did you look on HEREDOC
Heredoc text behaves just like a
double-quoted string, without the
double quotes. This means that quotes
in a heredoc do not need to be escaped
example of advantage here : http://www.shat.net/php/notes/heredoc.php
There are a lot of ways to make sure, this works just fine for example (PHP_EOL is a cross Platt form Constant for a new line Char (EndOfLine) ):
echo "<div id=\"test\">".PHP_EOL;
echo "\t<div id=\"test-sub\">".PHP_EOL;
echo "\t</div>".PHP_EOL;
echo "</div>".PHP_EOL;
I make use of a small set of classes I wrote in order to output nicely formatted HTML. If you are interested you can find it here.
To get what you want, I would end up writing something like
$mypage = page::blank();
$mypage->opennode('div', 'id="test"');
$mypage->opennode('div', 'id="test-sub"');
$mypage->closenode(2); // div, div
echo $mypage->build_output_strict();
Another alternative would be to use a full-fledged template engine, of which there are quite a few.
use double quotes
or a multi-line echo string:
echo '<div id="test">
<div id="test-sub">
</div>
</div>';
or templates.

Categories