Preg_replace or preg_replace_callback? - php

I have links on some pages that use an old system such as:
<a href='/app/?query=stuff_is_here'>This is a link</a>
They need to be converted to the new system which is like:
<a href='/newapp/?q=stuff+is+here'>This is a link</a>
I can use preg_replace t0 change some of what i need to, but i also need to replace underscores in the query with +'s instead. My current code is:
//$content is the page html
$content = preg_replace('#(href)="http://www.site.com/app/?query=([^:"]*)(?:")#','$1="http://www.site.com/newapp/?q=$2"',$content);
What I want to do is run str_replace on the $2 variable, so I tried using preg_replace_callback, and could never get it to work. What should I do?

You have to pass a valid callback [docs] as second parameter: a function name, an anonymous function, etc.
Here is an example:
function my_replace_callback($match) {
$q = str_replace('_', '+', $match[2]);
return $match[1] . '="http://www.site.com/newapp/?q=' . $q;
}
$content = preg_replace_callback('#(href)="http://www.site.com/app/?query=([^:"]*)(?:")#', 'my_replace_callback', $content);
Or with PHP 5.3:
$content = preg_replace_callback('#(href)="http://www.site.com/app/?query=([^:"]*)(?:")#', function($match) {
$q = str_replace('_', '+', $match[2]);
return $match[1] . '="http://www.site.com/newapp/?q=' . $q;
}, $content);
You may also want to try with a HTML parser instead of a regex: How do you parse and process HTML/XML in PHP?

Parsing your document with dom, searching for all "a" tags and then replacing could be a good way. Someone already commented posting you this link to show you that regex isn't always the best way to work with html.
Ayways this code should work:
<?php
$dom = new DOMDocument;
//html string contains your html
$dom->loadHTML($html);
?><ul><?
foreach( $dom->getElementsByTagName('a') as $node ) {
//look for href attribute
if( $node->hasAttribute( 'href' ) ) {
$href = $node->getAttribute( 'href' );
// change hrefs value
$node->setAttribute( "href", preg_replace( "/\/app\/\?query=(.*)/", "/newapp/?q=\1", $href ) );
}
}
//save new html
$newHTML = $dom->saveHTML();
?>
Notice that i did this with preg_replace but this can be done with str_ireplace or str_replace
$newHref = str_ireplace("/app/?query=", "/newapp/?q=", $href);

Or you can use simply preg_match() and collect matched strings. Then apply str_replace() to one of the matches and replace "+" to "_".
$content = preg_match('#href="\/[^\/]\/\?query=([^:"]+)#', $matches)
$matches[2] = 'newapp';
$matches[4] = str_replace('_', '+', $matches[4]);
$result = implode('', $matches)

Pass arrays to preg_replace as pattern and replacement:
preg_replace(array('|/app/|', '_'), array('/newappp/', '+'), $content);

Related

PHP preg_replace all text changing

I want to make some changes to the html but I have to follow certain rules.
I have a source code like this;
A beautiful sentence http://www.google.com/test, You can reach here http://www.google.com/test-mi or http://www.google.com/test/aliveli
I need to convert this into the following;
A beautiful sentence http://test.google.com/, You can reach here http://www.google.com/test-mi or http://test.google.com/aliveli
I tried using str_replace;
$html = str_replace('://www.google.com/test','://test.google.com');
When I use it like this, I get an incorrect result like;
A beautiful sentence http://test.google.com/, You can reach here http://test.google.com/-mi or http://test.google.com/aliveli
Wrong replace: http://test.google.com/-mi
How can I do this with preg_replace?
With regex you can use a word boundary and a lookahead to prevent replacing at -
$pattern = '~://www\.google\.com/test\b(?!-)~';
$html = preg_replace($pattern, "://test.google.com", $html);
Here is a regex demo at regex101 and a php demo at eval.in
Be aware, that you need to escape certain characters by a backslash from it's special meaning to match them literally when using regex.
It seems you're replacing the subdirectory test to subdomain. Your case seems to be too complicated. But I've given my best to apply some logic which may be reliable or may not be unless your string stays with the same structure. But you can give a try with this code:
$html = "A beautiful sentence http://www.google.com/test, You can reach here http://www.google.com/test-mi or http://www.google.com/test/aliveli";
function set_subdomain_string($html, $subdomain_word) {
$html = explode(' ', $html);
foreach($html as &$value) {
$parse_html = parse_url($value);
if(count($parse_html) > 1) {
$path = preg_replace('/[^0-9a-zA-Z\/-_]/', '', $parse_html['path']);
preg_match('/[^0-9a-zA-Z\/-_]/', $parse_html['path'], $match);
if(preg_match_all('/(test$|test\/)/', $path)) {
$path = preg_replace('/(test$|test\/)/', '', $path);
$host = preg_replace('/www/', 'test', $parse_html['host']);
$parse_html['host'] = $host;
if(!empty($match)) {
$parse_html['path'] = $path . $match[0];
} else {
$parse_html['path'] = $path;
}
unset($parse_html['scheme']);
$url_string = "http://" . implode('', $parse_html);
$value = $url_string;
}
}
unset($value);
}
$html = implode(' ', $html);
return $html;
}
echo "<p>{$html}</p>";
$modified_html = set_subdomain_string($html, 'test');
echo "<p>{$modified_html}</p>";
Hope it helps.
If the sentence is the only case in your problem you don't need to start struggling with preg_replace.
Just change your str_replace() functioin call to the following(with the ',' at the end of search string section):
$html = str_replace('://www.google.com/test,','://test.google.com/,');
This matches the first occurance of desired search parameter, and for the last one in your target sentence, add this(Note the '/' at the end):
$html = str_replace('://www.google.com/test/','://test.google.com/');
update:
Use these two:
$targetStr = preg_replace("/:\/\/www.google.com\/test[\s\/]/", "://test.google.com/", $targetStr);
It will match against all but the ones with comma at the end. For those, use you sould use the following:
$targetStr = preg_replace("/:\/\/www.google.com\/test,/", "://test.google.com/,", $targetStr);

PHP: preg_replace() to get "parent" component of NameSpace

How can I use the preg_replace() replace function to only return the parent "component" of a PHP NameSpace?
Basically:
Input: \Base\Ent\User; Desired Output: Ent
I've been doing this using substr() but I want to convert it to regex.
Note: Can this be done without preg_match_all()?
Right now, I also have a code to get all parent components:
$s = '\\Base\\Ent\\User';
print preg_replace('~\\\\[^\\\\]*$~', '', $s);
//=> \Base\Ent
But I only want to return Ent.
Thank you!
As Rocket Hazmat says, explode is almost certainly going to be better here than a regex. I would be surprised if it's actually slower than a regex.
But, since you asked, here's a regex solution:
$path = '\Base\Ent\User';
$search = preg_match('~([^\\\\]+)\\\\[^\\\\]+$~', $path, $matches);
if($search) {
$parent = $matches[1];
}
else {
$parent = ''; // handles the case where the path is just, e.g., "User"
}
echo $parent; // echos Ent
I think maybe preg_match might be a better choice for this.
$s = '\\Base\\Ent\\User';
$m = [];
print preg_match('/([^\\\\]*)\\\\[^\\\\]*$/', $s, $m);
print $m[1];
If you read the regular expression backwards, from the $, it says to match many things that aren't backslashes, then a backslash, then many things that aren't backslashes, and save that match for later (in $m).
How about
$path = '\Base\Ent\User';
$section = substr(strrchr(substr(strrchr($path, "\\"), 1), "\\"), 1);
Or
$path = '\Base\Ent\User';
$section = strstr(substr($path, strpos($path, "\\", 1)), "\\", true);

How to remove HTML from a value when sending $_POST / $result with jQuery PHP?

I have a jQuery script which onClick Button requests and imports values from a PHP file and inserts this values to a form.
Now I can successfully get values to the correct fields but the problem is the values contains HTML codes and I do need only TEXT results.
I need value for Field: "Ethnicity" and Value should be: "Caucasian"
I have tried different ways to see the values of $ethnicity and the wired thing is:
echo $ethnicity; // result: Caucasian
var_dump($ethnicity); // result: string(146) " Caucasian "
$result['ethnicity'] = $ethnicity; // result: {"ethnicity":" Caucasian<\/td>\r"}
As you can see the $result value is: "Caucasian<\/td>\r"
I tried also:
$result['ethnicity'] = ($ethnicity->textContent); // result: It says "null"
My question is: How can I remove this HTML codes from this value?
There is strip_tags for that
http://php.net/manual/en/function.strip-tags.php
$result['ethnicity'] = strip_tags($ethnicity);
By using filter_var() and trim() function
$str = filter_var(trim($ethinicity,'\t\n\r\0\x0B'), FILTER_SANITIZE_STRING,FILTER_FLAG_STRIP_LOW);
You can make use of strip_tags
$ethinicity=strip_tags($ethinicity);
(or)
Make use of this function [From PHP Source Manual]
<?php
function html2txt($document){
$search = array('#<script[^>]*?>.*?</script>#si', // Strip out javascript
'#<[\/\!]*?[^<>]*?>#si', // Strip out HTML tags
'#<style[^>]*?>.*?</style>#siU', // Strip style tags properly
'#<![\s\S]*?--[ \t\n\r]*>#' // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return $text;
}
$ethinicty=html2txt($ethinicity);
?>
EDIT:
Now using strip_tags it removed "</td>" but "\r" is still there. Now
it looks like: " Caucasian\r". Can I remove the whitespace/space
before word Caucasian also?
$ethnicity = str_replace(' \r','',strip_tags($ethnicity));
$val = trim($val);
$val = strip_tags($val);
$val = htmlentities($val, ENT_QUOTES, 'UTF-8'); // convert funky chars to html entities
$pat = array("\r\n", "\n\r", "\n", "\r"); // remove returns
$val = str_replace($pat, '', $val);
$pat = array('/^\s+/', '/\s{2,}/', '/\s+\$/'); // remove multiple whitespaces
$rep = array('', ' ', '');
$val = preg_replace($pat, $rep, $val);
$val = trim($val);
$val = mysql_real_escape_string($val); // excellent final step for MySQL entry
There is no single function to separate string........
reference

preg_replace apply string function (like urlencode) in replacement

i want to parse all links in html document string in php in such way: replace href='LINK' to href='MY_DOMAIN?URL=LINK', so because LINK will be url parameter it must be urlencoded. i'm trying to do so:
preg_replace('/href="(.+)"/', 'href="http://'.$host.'/?url='.urlencode('${1}').'"', $html);
but '${1}' is just string literal, not founded in preg url, what need i do, to make this code working?
Well, to answer your question, you have two choices with Regex.
You can use the e modifier to the regex, which tells preg_replace that the replacement is php code and should be executed. This is typically seen as not great, since it's really no better than eval...
preg_replace($regex, "'href=\"http://{$host}?url='.urlencode('\\1').'\"'", $html);
The other option (which is better IMHO) is to use preg_replace_callback:
$callback = function ($match) use ($host) {
return 'href="http://'.$host.'?url='.urlencode($match[1]).'"';
};
preg_replace_callback($regex, $callback, $html);
But also never forget, don't parse HTML with regex...
So in practice, the better way of doing it (The more robust way), would be:
$dom = new DomDocument();
$dom->loadHtml($html);
$aTags = $dom->getElementsByTagName('a');
foreach ($aTags as $aElement) {
$href = $aElement->getAttribute('href');
$href = 'http://'.$host.'?url='.urlencode($href);
$aElement->setAttribute('href', $href);
}
$html = $dom->saveHtml();
Use the 'e' modifier.
preg_replace('/href="([^"]+)"/e',"'href=\"http://'.$host.'?url='.urlencode('\\1').'\"'",$html);
http://uk.php.net/preg-replace - example #4

Multiple regular expression interfere

I use regex to create html tags in plain text. like this
loop
$SearchArray[] = "/\b(".preg_quote($user['name'], "/").")\b/i";
$ReplaceArray[] = '$1';
-
$str = preg_replace($SearchArray, $ReplaceArray, $str);
I'm looking for a way to not match $user['name'] in a tag.
You could use preg_replace_callback()
for 5.3+:
$callback = function($match) using ($user) {
return ''.$match[1].'';
};
$regex = "/\b(".preg_quote($user['name'], "/").")\b/i";
$str = preg_replace_callback($regex, $callback, $string);
for 5.2+:
$method = 'return \'\'.$match[1].\'\';';
$callback = create_function('$match', $method);
$regex = "/\b(".preg_quote($user['name'], "/").")\b/i";
$str = preg_replace_callback($regex, $callback, $string);
So the problem is that you're making several passes over the document, replacing a different user name in each pass, and you're afraid you'll unintentionally replace a name inside a tag that was created in a previous pass, right?
I would try to do all of the replacements in one pass, using preg_replace_callback as #ircmaxwell suggested, and one regex that can match any legal user name. In the callback function, you look up the matched string to see if it's a real user's name. If it is, return the generated link; if not, return the matched string for reinsertion.
It looks like you're trying to add a bunch of anchors to a document. Have you thought of using SimpleXML. This assumes that the anchor tags are part of a larger xhtml document.
//$xhtml_doc is some xhtml doc's path
$doc = simplexml_load_file($xhtml);
//NOTE: find the parent element for all these anchors (maybe with xpath)
//example: $parent = $doc->xpath('//div[#id=parent]');
foreach($user as $k => $v){
$anchor = $doc->addChild('a', $v['name']);
$anchor->addAttribute('href', $v['url']);
}
return $doc->asXML();
simpleXML helps me a lot in these situations. It'll be a lot faster than regex, even if this isn't exactly what you want to do.

Categories