preg_replace need help with expression - php

This is my code:
$string = '« PreviousNext »';
$string = htmlspecialchars($string, ENT_COMPAT, 'UTF-8');
$string = preg_replace('#(<a).*?(nextlink)#s', '', $string);
echo $string;
I am trying to remove the last link:
Next »';
My current output:
">Next »</a>
It removes everything from the start.
I want it to remove only the one with strpos, is this possible with preg_replace and how?
Thanks.

quite a tricky question to solve
first off,
the .*? will not match like you are expecting it to.
its starts from the left finds the first match for <a, then searches until it finds nextlink, which is essentially picking up the entire string.
for that regex to work as you wanted, it would need to match from the righthand side first and work backwards through the string, finding the smallest (non-greedy) match
i couldn't see any modifiers that would do this
so i opted for a callback on each link, that will check and remove any link with nextlink in it
<?php
$string = '« PreviousNext »';
echo "RAW: $string\r\n\r\n";
$string = htmlspecialchars($string, ENT_COMPAT, 'UTF-8');
echo "SRC: $string\r\n\r\n";
$string = preg_replace_callback(
'#&lt\;a.+?</a>#',
'remove_nextlink',
$string
);
function remove_nextlink($matches) {
// if you want to see each line as it works, uncomment this
// echo "L: $matches[0]\r\n\r\n";
if (strpos($matches[0], 'nextlink') === FALSE) {
return $matches[0]; // doesn't contain nextlink, put original string back
} else {
return ''; // contains nextlink, replace with blank
}
}
echo "PROCESSED: $string\r\n\r\n";

Note: This is not a direct answer, but a suggestion to another approach.
I was told once; if you can do it in any other way, stay away from regex. I don't though, it's my white whale. Have you heard of phpQuery? It's jQuery implemented in PHP and very powerful. It would be able to do what you want in a very easy way. I know it's not regex, but perhaps it's of use to you.
If you really want to go ahead, I can recommend http://gskinner.com/RegExr/ . I think it's a great tool.

Related

Create a function to find a specific word in the title

I have the following title formation on my website:
It's no use going back to yesterday, because at that time I was... Lewis Carroll
Always is: The phrase… (author).
I want to delete everything after the ellipsis (…), leaving only the sentence as the title. I thought of creating a function in php that would take the parts of the titles, throw them in an array and then I would work each part, identifying the only pattern I have in the title, which is the ellipsis… and then delete everything. But when I do that, in the X space of my array, it returns the following:
was...
In position 8 of the array comes the word and the ellipsis and I don't know how to find a pattern to delete the author of the title, my pattern was the ellipsis. Any idea?
<?php
$a = get_the_title(155571);
$search = '... ';
if(preg_match("/{$search}/i", $a)) {
echo 'true';
}
?>
I tried with the code above and found the ellipsis, but I needed to bring it into an array to delete the part I need. I tried something like this:
<?php
define('WP_USE_THEMES', false);
require('./wp-blog-header.php');
global $wpdb;
$title_array = explode(' ', get_the_title(155571));
$search = '... ';
if (array_key_exists("/{$search}/i",$title_array)) {
echo "true";
}
?>
I started doing it this way, but it doesn't work, any ideas?
Thanks,
If you use regex you need to escape the string as preg_quote() would do, because a dot belongs to the pattern.
But in your simple case, I would not use a regex and just search for the three dots from the end of the string.
Note: When the elipsis come from the browser, there's no way to detect in PHP.
$title = 'The phrase... (author).';
echo getPlainTitle($title);
function getPlainTitle(string $title) {
$rpos = strrpos($title, '...');
return ($rpos === false) ? $title : substr($title, 0, $rpos);
}
will output
The phrase
First of all, since you're working with regular expressions, you need to remember that . has a special meaning there: it means "any character". So /... / just means "any three characters followed by a space", which isn't what you want. To match a literal . you need to escape it as \.
Secondly, rather than searching or splitting, you could achieve what you want by replacing part of the string. For instance, you could find everything after the ellipsis, and replace it with an empty string. To do that you want a pattern of "dot dot dot followed by anything", where "anything" is spelled .*, so \.\.\..*
$title = preg_replace('/\.\.\..*/', '', $title);

preg_replace, str_replace and substr_replace not working in special condition

I have the following code:
this code finds all html tags in a string and replaces them with [[0]], [[1]] ,[[2]] and so on.(at least that is intented but not workinng);
$str = "some text <a href='/review/'>review</a> here <a class='abc' href='/about/'>link2</a> hahaha";
preg_match_all("|<[^>]+>(.*)</[^>]+>|U",$str, $out, PREG_OFFSET_CAPTURE);
$count = 0;
foreach($out[0] as $result) {
$temp=preg_quote($result[0],'/');
$temp ="/".$temp."/";
preg_replace($temp, "[[".$count."]]", $str,1);
$count++;
}
var_dump($str);
This code finds all the tags in a string and replaces them with [[0]], [[1]] and [[2]] and so on. I have used preg_match_all with PREG_OFFSET_CAPTURE.
The output of preg_match_all is as expected. However, preg_replace, substr_replace, and str_replace do not work when substituting the tags with [[$count]].
I have tried all three string replacement methods and none of them work. Please point me in the right direction.
Can something in php.ini cause this?
Thanks in advance.
preg_replace does not substitute $str. Assign it to the string again:
$str = preg_replace($temp, "[[".$count."]]", $str);
Also, I'm not sure what you want exactly, but this I changed some things in the code, which seems to be what you were tying to do. I changed the regex a bit, especially the (.*?) part to ([^<>]+).
the problem may be in this line
foreach($out[0] as $result) {
change it to this
foreach($out as $result) {
because i think you are accessing an index that doesn't exists

PHP:preg_replace function

$text = "
<tag>
<html>
HTML
</html>
</tag>
";
I want to replace all the text present inside the tags with htmlspecialchars(). I tried this:
$regex = '/<tag>(.*?)<\/tag>/s';
$code = preg_replace($regex,htmlspecialchars($regex),$text);
But it doesn't work.
I am getting the output as htmlspecialchars of the regex pattern. I want to replace it with htmlspecialchars of the data matching with the regex pattern.
what should i do?
You're replacing the match with the pattern itself, you're not using the back-references and the e-flag, but in this case, preg_replace_callback would be the way to go:
$code = preg_replace_callback($regex,'htmlspecialchars',$text);
This will pass the mathces groups to htmlspecialchars, and use its return value as replacement. The groups might be an array, in which case, you can try either:
function replaceCallback($matches)
{
if (is_array($matches))
{
$matches = implode ('', array_slice($matches, 1));//first element is full string
}
return htmlspecialchars($matches);
}
Or, if your PHP version permits it:
preg_replace_callback($expr, function($matches)
{
$return = '';
for ($i=1, $j = count($matches); $i<$j;$i++)
{//loop like this, skips first index, and allows for any number of groups
$return .= htmlspecialchars($matches[$i]);
}
return $return;
}, $text);
Try any of the above, until you find simething that works... incidentally, if all you want to remove is <tag> and </tag>, why not go for the much faster:
echo htmlspecialchars(str_replace(array('<tag>','</tag>'), '', $text));
That's just keeping it simple, and it'll almost certainly be faster, too.
See the quickest, easiest way in action here
If you want to isolate the actual contents as defined by your pattern, you could use preg_match($regex,$text,$hits);. This will give you an array of hits those bits that were between the paratheses in the pattern, starting at $hits[1], $hits[0] contains the whole matched string). You can then start manipulating these found matches, possibly using htmlspecialchars ... and combine them again into $code.

regex for breadcrumb in php

I am currently building breadcrumb. It works for example for
http://localhost/researchportal/proposal/
<?php
$url_comp = explode('/',substr($url,1,-1));
$end = count($url_comp);
print_r($url_comp);
foreach($url_comp as $breadcrumb) {
$landing="http://localhost/";
$surl .= $breadcrumb.'/';
if(--$end)
echo '
<a href='.$landing.''.$surl.'>'.$breadcrumb.'</a>»';
else
echo '
'.$breadcrumb.'';
};?>
But when I typed in http://localhost////researchportal////proposal//////////
All the formatting was gone as it confuses my code.
I need to have the site path in an array like ([1]->researchportal, [2]->proposal)
regardless of how many slashes I put.
So can $url_comp = explode('/',substr($url,1,-1)); be turned into a regular expression to get my desired output?
You don't need regex. Look at htmlentities() and stripslashes() in the PHP manual. A regex will return a boolean value of whatever it says, and won't really help you achieve what you are trying to do. All the regex can let you do is say if the string matches the regex do something. If you put in a regex requiring at least 2 characters between each slash, then any time anyone puts more than one consecutive slash in there, the if statement will stop.
http://ca3.php.net/manual/en/function.stripslashes.php
http://ca3.php.net/manual/en/function.htmlentities.php
Found this on the php manual.
It uses simple str_replace statements, modifying this should achieve exactly what your post was asking.
<?
function stripslashes2($string) {
$string = str_replace("\\\"", "\"", $string);
$string = str_replace("\\'", "'", $string);
$string = str_replace("\\\\", "\\", $string);
return $string;
}
?>

php/regex: "linkify" blog titles

I'm trying to write a simple PHP function that can take a string like
Topic: Some stuff, Maybe some more, it's my stuff?
and return
topic-some-stuff-maybe-some-more-its-my-stuff
As such:
lowercase
remove all non-alphanumeric non-space characters
replace all spaces (or groups of spaces) with hyphens
Can I do this with a single regex?
function Slug($string)
{
return strtolower(trim(preg_replace('~[^0-9a-z]+~i', '-', html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8')), ENT_QUOTES, 'UTF-8')), '-'));
}
$topic = 'Iñtërnâtiônàlizætiøn';
echo Slug($topic); // internationalizaetion
$topic = 'Topic: Some stuff, Maybe some more, it\'s my stuff?';
echo Slug($topic); // topic-some-stuff-maybe-some-more-it-s-my-stuff
$topic = 'here عربي‎ Arabi';
echo Slug($topic); // here-arabi
$topic = 'here 日本語 Japanese';
echo Slug($topic); // here-japanese
Many frameworks provide functions for this
CodeIgniter:
http://bitbucket.org/ellislab/codeigniter/src/c39315f13a76/system/helpers/url_helper.php#cl-472
wordpress (has many more in the code):
http://core.trac.wordpress.org/browser/trunk/wp-includes/formatting.php#L814
You can do it with one preg_replace:
preg_replace(array("/[A-Z]/e", "/\\p{P}/", "/\\s+/"),
array('strtolower("$0")', '', '-'), $str);
Technically, you could do it with one regex, but this is simpler.
Preemptive response: yes, it unnecessarily uses regular expressions (though very simple ones), an unecessarily big number of calls to strtolower, and it doesn't consider non-english characters (he doesn't even give an encoding); I'm just satisfying the OP's requirements.
Why are regular expressions considered the universal panacea to all life's problems (just because a lowly backtrace in a preg_match has discovered the cure for cancer). here's a solution without recourse to regexp:
$str = "Topic: Some stuff, Maybe some more, it's my stuff?";
$str = implode('-',str_word_count(strtolower($str),2));
echo $str;
Without going the whole UTF-8 route:
$str = "Topic: Some stuff, Maybe some more, it's my Iñtërnâtiônàlizætiøn stuff?";
$str = implode('-',str_word_count(strtolower(str_replace("'","",$str)),2,'Þßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ'));
echo $str;
gives
topic-some-stuff-maybe-some-more-its-my-iñtërnâtiônàlizætiøn-stuff

Categories