Understanding strpos, preg_replace, and preg_match functions - php

I'm working on a WordPress site and want to add rel="noreferrer noopener" to all external links. I have found a function that is doing exactly what I want, but I would like to expand it. What I want to do is if the link has rel="nofollow", then replace it with rel="nofollow noopener noreferrer".
Here is the function I'm working on, and I have difficulties to understand several functions. I would like to understand them so I can resolve my issue.
add_filter('the_content', 'rel_function');
function rel_function($content) {
return preg_replace_callback('/<a[^>]+/', 'rel_all_external_links', $content);
}
function rel_all_external_links($Matches) {
$externalLink = $Matches[0];
$SiteLink = get_bloginfo('url');
if (strpos($link, 'rel') === false) {
$externalLink = preg_replace("%(href=\S(?!$SiteLink))%i", 'rel="noopener noreferrer" target="_blank" $1', $externalLink);
} elseif (preg_match("%href=\S(?!$SiteLink)%i", $externalLink)) {
$externalLink = preg_replace('/rel=\S(?!nofollow)\S*/i', 'rel="noopener noreferrer" target="_blank"', $externalLink);
}
return $externalLink;
}
First thing I don't understand is if (strpos($link, 'rel') === false). How come is that $link is undefined? I don't understand how strpos is getting value for this variable? And does this returns true if the link doesn't have 'rel'?
The second is $externalLink = preg_replace("%(href=\S(?!$SiteLink))%i", 'rel="noopener noreferrer" target="_blank" $1', $externalLink);
If I'm understanding correctly, preg_replace checks if $extenalLink doesn't contain base URL and replace all string except href="..."? Is that correct?
And the last one:
elseif (preg_match("%href=\S(?!$SiteLink)%i", $externalLink)) {
$externalLink = preg_replace('/rel=\S(?!nofollow)\S*/i', 'rel="noopener noreferrer" target="_blank"', $externalLink);
}
It checks if $externalLink doesn't have a base URL, right? I'm not sure what preg_replace is trying to do here, but I think it is the key to my problem.
I would appreciate any help.

strpos is a function that allows you to find if a specific word or combination of characters present in the string you provided. It's like yes/no and additionally, it can give you it's start location. It is important to note that it will return false when nothing is found or a number, including 0 when the match is found.
So this function is very good for simple scenarios like does it have letters app in the word apple? the answer is yes, and the start position is 0.
In the example, you provided if (strpos($link, 'rel') === false) it searches if word rel is present, and it doesn't matter in what position.
preg_match is used for much complex searching using regex, capable of searching multiple conditions, groups and more. To give an example of what it can search: the word apprehensive does it starts with app and ends with sive? - answer yes. If strpos would've been used, it would say yes to all such cases due to the app being present but the ending is never checked as it's incapable of such thing. preg_match can commonly be used to get what's inside of an attribute, so it grabs rel="[what's here]".
Preg_replace is replacing something using regex for complex searching first, then do the replacement of a match.
I would always advise to read up documentation provided by php for the functions and it's accepted arguments.
Information about strpos
Information about preg_replace
Information about preg_match
Information about regex (takes a good while to learn!)

Related

php move vaiable substring to end of string

say I have a string:
something.something/search?&abc=xyz:??????q=hello+there
or
something.something/search?abc=xyz:??????q=hello+there
I need to move variable substring: [&]abc=xyz:?????? to the end of sting.
So I wind up with:
something.something/search?q=hello+there&abc=xyz:??????
The substring is unknown at the start in terms of how long it is.
We know the substring starts right after the first ? and that it
starts with &abc=xyz:????? possibly or abc=xyx:????? possibly.
There may or may not be the ????? and they are of indeterminate length.
and content.
We know the the substring ends at the q=.......
So what is between first ? to first q= gets removed and added to string.
However that ending must begin with an & and only 1 &.
All this only applies for strings containing something.something/search?.
The substring is quite a variable.
I am also wondering if I should test for something before I try the change.
Thanks
It says to edit question so here goes. I think you are getting close.
A typical $url would be:
https://www._oogle.com/search?&tbs=qdr:q=hello+there //hello+there only example.
https://www._oogle.com/search?tbs=qdr:q=hello+there //...qdr:null|h|d|w|m|y which is for past page searches. (how old) null(all),hour,day,week,month,year.
If the tbs:qdr part comes first other things break, so I have to move it after
the q=... part.
And of course the are the http:// varients to be considered.
I was thinking to use a contains function I made to see if this $url needs this treatment. It needs to catch the "_oogle.com/search?&tbs=qdr:" possibility however. (starts with ampersand) [null|&]tbs=qdr:[null|h|d|w|m|y] . I guess there could possibly be other parameters before the "q=" part, but lets worry about that later.
if (contains($url, "_oogle.com/search?tbs=qdr:"))
function contains($haystack, $needle) {
if(stristr($haystack, $needle) === FALSE) { return false; }
else {return $needle; }
I don't know if this is exactly what you are looking for, but your can try to use a regex on your params. If you have more params before and after the q= you will have to manipulate the regex a little more and you will have to loop in an array to define the correct position of the param you are looking for. Here is an example of what you could do.
$baseUrl = 'http://www.google.com';
$parseUrl = parse_url($baseUrl.'/search?&abc=xyz:??????q=hello+there');
preg_match_all('/&?(.*)q=(.*)/',$parseUrl['query'],$matches, PREG_PATTERN_ORDER);
echo $baseUrl.'/search?q='.$matches[2][0].'&'.$matches[1][0];

Check if text contains url, email and phone number with php and regex

I have a text, for example, like: $descrizione = "Tel.+39.1234.567899 asd.test#testwebsite.com
www.testwebsite.com" and I would like to obtain three different variable with:
"+39.1234.567899""asd.test#testwebsite.com"
"www.testwebsite.com".
To check if text contains email I use regex and I write this code:
$regex = '/[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})/';
if (preg_match($regex, $descrizione, $email_is)) {
for($e = 0; $e < count($email_is) ; $e++){
if(strpos($email_is[$e], "#") !== false){
$linkEmail = $email_is[$e];
}
}
}
now, I would like to find website url, so I try to write:
$regex = '/[-a-zA-Z0-9#:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?/gi';
if( preg_match($regex, $descrizione, $matches)){
$linkWebsite = $matches[0];
}
but the preg_match return false. I control the regex with the website http://regexr.com/ and it's correct, so I don't understand why return always false. Where is the problem?I try to use "/[-a-zA-Z0-9#:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?/" but I have the same problem and I try to check errors with trycatch but it doesn't return errors.
Finally I would like to find phone number but I don't know how to write regex.
Is there someone thet can help me, please?
Your regex fails because it's faulty. You've escaped the slashes (/) with slashes. You should use backslashes:
[-a-zA-Z0-9#:%_\+.~#?&\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&\/=]*)?
Here at regex101.
Since regexr uses JS regex it doesn't complain, but if you try it at regex101 selecting php you'll easily detect such errors.
About regex for phone numbers - search! E.g https://stackoverflow.com/search?q=%5Bregex%5D+phone+number
I have find the solution, I hope thet this can help someone.
The preg_match returns only first result and not all the result thet it has find.
So, if I check the regex using a website like regex101, it returns the corrects result with all matches, but if I use the same regex in php, it returns only one.
The regex option "g" (global = don't return after first match) corresponds to the function preg_match_all.

Alternative strings in regular expression

Sadly I have to ask this question but after noodling on this problem the whole morning, I give up. Searching online, man pages, documents, none of it seems to give me a conclusive answer to what I try to do.
Looking for a regular expression for the PHP function preg_match to match a string against a pattern. Now that pattern is what gives me headaches.
The pattern should express the following: string starts with "_MG_" or "IMG_" or "DSC_", followed by four digits, followed by an optional "-N" where N is another digit. For example, "IMG_0123" or "DSC_9876-3" are valid. Everything else should be rejected.
I came up with various patterns, but none of them seems to work. For example, I tried
(_MG_|IMG_|DSC_)[0-9]{4}(-[0-9])?
and this in different variations with ( ) and apostrophes around various sub-expressions and using ? vs {0,1} and whatnot. (I experimented using grep, but got no matches still.) Yes, I know I need to add "/.../" for PHP, but here I left it out for readability's sake.
Can I even express this in a single expressions, or will I have to call the matching function several times? If several matches are required, I might be better off writing a small parser for this particular string matching myself.
Thanks!
EDIT: Here is the code that I'm working with
// Iterate over all images in this gallery folder.
if ($h = opendir($dir)) {
while (($f = readdir($h)) !== false) {
// Skip images whose name doesn't match the requirement.
if (0 == preg_match("/(_MG_|IMG_|DSC_)[0-9]{4}(-[0-9]){0,1}/", $f)) {
continue;
}
...
}
}
And this also allows image names like "_MG_7020-1-2.jpg" or "_MG_7444-5-6.2.jpg" or "IMG_6543_2_4_tonemapped.jpg" but that's not what I want to allow.
<?php
$array = array('IMG_0123', 'DSC_9876-3', '_MG_1234', 'DSC_fail');
foreach($array as $arr) {
if(preg_match("/_MG_|IMG_|DSC_[0-9]{4}[-0-9]*/", $arr)) {
echo $arr . ' => TRUE <br />';
} else {
echo $arr . ' => FALSE <br />';
}
}
?>
The above works as expected for me.
I ran this as well:
<?php
$matches = array();
preg_match('/(_MG_|IMG_|DSC_)[0-9]{4}(-[0-9])?/','IMG_0123-3',$matches );
var_dump($matches);
Output:
array(3) {
[0]=>
string(10) "IMG_0123-3"
[1]=>
string(4) "IMG_"
[2]=>
string(2) "-3"
}
Seems ok, unless I'm missing something, or unless what you're referring to is that preg_match returns false if not all your matchers () match.
Note the return type for preg_match from the php doc:
preg_match() returns the number of times pattern matches. That will be either 0 times (no match) or 1 time because preg_match() will stop searching after the first match. preg_match_all() on the contrary will continue until it reaches the end of subject. preg_match() returns FALSE if an error occurred.
So you may be looking to really use preg_match_all() in fact
According to this refiddle, you seem to have it solved just fine. You can use their "unit" test functionality additional "should" and "should not" match scenarios. Granted, that refiddle is using javascript's regex, but I find them to be effectively identical until you get into backreferences and lookarounds.
Here is your original pattern with start and end of string anchor as well as some edits to reduce the pattern length.
Code: (Demo)
var_export(
preg_grep(
'/^(?:DSC|[_I]MG)_\d{4}(?:-\d)?$/',
$array
)
);

How do I find a url in a string closest to a substring(after using stripos)? (PHP)

I'm working with an html string and trying to find the closest url in position to the substring.
if (stripos($theemailmessage,'substring') !== FALSE )
{
$indicatornumber = '1';
}
So stripos() should give me the position of this substring inside the string. How would I go about searching for values within a url from here? I'm assuming it would be something traversing the string positions looking for http:// , but I'm really not sure which function I should be using.
There are many URLs in the document that I am searching for, I'm searching for the one closest to the string position. Actually, I want to search to see if the string is inside an anchor tag first, but I figured I'd start by learning how to search for the closest url, and then refine from there.
Something like this?
preg_match('/http\:\/\/[a-zA-Z0-9-.]+.[a-zA-Z]{2,3}(\/\S*)?/',$str, $url);
echo $url[0];
you have several options to implement this. use preg_replace function or use parse_url function.
$tempText = "hello.. how are you?? are you keeping well?? please checkout this url http://www.youtube.com/watch?v=ehuwoGVLyhg&feature=topvideos";
print_r(parse_url($tempText,PHP_URL_PATH));
Good luck finding a regexp to match URLs, but suppose you have one.
Then,
preg_match_all($url_regexp, $source, $matches, PREG_OFFSET_CAPTURE);
Take a look in $matches and you will have every URL plus its position. Iterate over these to find which is closest to your substring position. Make sure that you account for the length of the matches and your substring.

Google Style Regular Expression Search

It's been several years since I have used regular expressions, and I was hoping I could get some help on something I'm working on. You know how google's search is quite powerful and will take stuff inside quotes as a literal phrase and things with a minus sign in front of them as not included.
Example: "this is literal" -donotfindme site:examplesite.com
This example would search for the phrase "this is literal" in sites that don't include the word donotfindme on the webiste examplesite.com.
Obviously I'm not looking for something as complex as Google I just wanted to reference where my project is heading.
Anyway, I first wanted to start with the basics which is the literal phrases inside quotes. With the help of another question on this site I was able to do the following:
(this is php)
$search = 'hello "this" is regular expressions';
$pattern = '/".*"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches);
But this outputs "this" instead of the desired this, and doesn't work at all for multiple phrases in quotes. Could someone lead me in the right direction?
I don't necessarily need code even a real nice place with tutorials would probably do the job.
Thanks!
Well, for this example at least, if you want to match only the text inside the quotes you'll need to use a capturing group. Write it like this:
$pattern = '/"(.*)"/';
and then $matches will be an array of length 2 that contains the text between the quotes in element 1. (It'll still contain the full text matched in element 0) In general, you can have more than one set of these parentheses; they're numbered from the left starting at 1, and there will be a corresponding element in $matches for the text that each group matched. Example:
$pattern = '/"([a-z]+) ([a-z]+) (.*)"/';
will select all quoted strings which have two lowercase words separated by a single space, followed by anything. Then $matches[1] will be the first word, $matches[2] the second word, and $matches[3] the "anything".
For finding multiple phrases, you'll need to pick out one at a time with preg_match(). There's an optional "offset" parameter you can pass, which indicates where in the string it should start searching, and to find multiple matches you should give the position right after the previous match as the offset. See the documentation for details.
You could also try searching Google for "regular expression tutorial" or something like that, there are plenty of good ones out there.
Sorry, but my php is a bit rusty, but this code will probably do what you request:
$search = 'hello "this" is regular expressions';
$pattern = '/"(.*)"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches[1]);
$matches1 will contain the 1st captured subexpression; $matches or $matches[0] contains the full matched patterns.
See preg_match in the PHP documentation for specifics about subexpressions.
I'm not quite sure what you mean by "multiple phrases in quotes", but if you're trying to match balanced quotes, it's a bit more involved and tricky to understand. I'd pick up a reference manual. I highly recommend Mastering Regular Expressions, by Jeffrey E. F. Friedl. It is, by far, the best aid to understanding and using regular expressions. It's also an excellent reference.
Here is the complete answer for all the sort of search terms (literal, minus, quotes,..) WITH replacements . (For google visitors at the least).
But maybe it should not be done with only regular expressions though.
Not only will it be hard for yourself or other developers to work and add functionality on what would be a huge and super complex regular expression otherwise
it might even be that it is faster with this approach.
It might still need a lot of improvement but at least here is a working complete solution in a class. There is a bit more in here than asked in the question, but it illustrates some reasons behind some choices.
class mySearchToSql extends mysqli {
protected function filter($what) {
if (isset(what) {
//echo '<pre>Search string: '.var_export($what,1).'</pre>';//debug
//Split into different desires
preg_match_all('/([^"\-\s]+)|(?:"([^"]+)")|-(\S+)/i',$what,$split);
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Surround with SQL
array_walk($split[1],'self::sur',array('`Field` LIKE "%','%"'));
array_walk($split[2],'self::sur',array('`Desc` REGEXP "[[:<:]]','[[:>:]]"'));
array_walk($split[3],'self::sur',array('`Desc` NOT LIKE "%','%"'));
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Add AND or OR
$this ->where($split[3])
->where(array_merge($split[1],$split[2]), true);
}
}
protected function sur(&$v,$k,$sur) {
if (!empty($v))
$v=$sur[0].$this->real_escape_string($v).$sur[1];
}
function where($s,$OR=false) {
if (empty($s)) return $this;
if (is_array($s)) {
$s=(array_filter($s));
if (empty($s)) return $this;
if($OR==true)
$this->W[]='('.implode(' OR ',$s).')';
else
$this->W[]='('.implode(' AND ',$s).')';
} else
$this->W[]=$s;
return $this;
}
function showSQL() {
echo $this->W? 'WHERE '. implode(L.' AND ',$this->W).L:'';
}
Thanks for all stackoverflow answers to get here!
You're in luck because I asked a similar question regarding string literals recently. You can find it here: Regex for managing escaped characters for items like string literals
I ended up using the following for searching for them and it worked perfectly:
(?<!\\)(?:\\\\)*(\"|')((?:\\.|(?!\1)[^\\])*)\1
This regex differs from the others as it properly handles escaped quotation marks inside the string.

Categories