to fetch case insensitive word from a regular expression - php

Suppose, I am having a string like
$res = "there are many restaurants in the city. Restaurants like xyz,abc. one restaurant like.....";
In the above example, We can find restaurant in 3 places. I need the count to be 3.
$pattern = '/Restaurant/';
preg_match($pattern, substr($res,10), $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
One more problem
Which is related to the above question. i.e., I am having text like Food & Drinks. I need to match this word with food or drinks or seafood... etc. can anyone please help me in getting this. Thanks in advance.

You can use a regex like this:
$pattern = '/restaurants?/i';
There are two changes that I made to your original regex:
Adding the i modifier - this is the case insensitive flag.
Adding s? to the end of the search string. This makes the last s character optional. It matches zero or one occurances of s.
Note that because we are using the case insensitive flag, this regex will also match things like :
ResTaurants
rEstaurantS
RESTauRANTS

The i modifier is used for case-insensitive matching. The ? quantifier makes the preceding token optional matching in this case the preceding s either zero or one time.
You are using preg_match() wanting to get all matches, you need preg_match_all()
$pattern = '/restaurants?/i';
preg_match_all($pattern, substr($res,10), $matches, PREG_OFFSET_CAPTURE);
print_r($matches[0]);
See working demo

I suggest looking at a regex guide - this is a very simple request.
| in regex means or and ? means 0 or 1 of previous char or group, so the following pattern should work for your specification:
$pattern = '/[Rr]estaurants?/

As a solution to your problem please try executing following code snippet
$url = "http://www.examplesite.com/";
$curl = new Curl();
$res = $curl->get($url);
$pattern = '/Restaurant(s)*/i';
preg_match($pattern, substr($res,10), $matches, PREG_OFFSET_CAPTURE);
print_r($matches);

Related

Regex: Finding number by capturing but excluding

I'm new to regex and I am really bad at it.
I've been trying to solve this problem but still can't get the result. So, I'm hoping that someone is able to assist me. thanks!
$str = "/tqrfq_58533_13";
preg_match_all('/\d+(?>=_)*/', $str, $matches);
print_r($matches); // gets 58533, 13
but I only want '58533' and not both numbers. So I want the array of $matches to return '58533' as the only number
Use /(?<=_)(\d+)(?=_)/ as pattern in preg_match() that match digits between _
$str = "/tqrfq_58533_13";
preg_match('/(?<=_)(\d+)(?=_)/', $str, $matches);
echo $matches[0];
// 58533
Check result in demo
Also you can use preg_replace() if you don't want to get array as result
echo preg_replace('/.*?_(\d+)_.*/', "$1", $str);
// 58533
preg_match_all('/\d+(?=_)/', $str, $matches);
If you want to get only one number, remove * part since it means the result will be more than one. AFAIK, there is no such things like (?>=_). I use (?=_) to indicate that _ immediately follow the number.
You can see this link for more clarification.

How to optimize this regex

Can someone help me to optimize my regex pattern, so I don't have to go through each regexes below. So it matches all of the string like the example I provided.
$pattern = "/__\(\"(.*)\"/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
$pattern = "/__\(\"(.*)\",/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
$pattern = "/__\(\'(.*)\'/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
$pattern = "/__\(\'(.*)\',/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
$pattern = "/_e\(\"(.*)\"/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
$pattern = "/_e\(\"(.*)\",/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
$pattern = "/_e\(\'(.*)\'/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
$pattern = "/_e\(\'(.*)\',/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
Example:
_e('string');
_e("string");
_e('string', 'string2');
_e("string", 'string2');
__('string');
__("string");
__('string', 'string2');
__("string", 'string2');
Also if it possible, to match also these string below.
"string"|trans
'string'|trans
"string"|trans({}, "string2")
'string'|trans({}, 'string2')
'string'|trans({}, "string2")
"string"|trans({}, 'string2')
If it is possible to get the value string2 too. In the worst case, in the file, there are also mixed single and double quote.
Like you see on my preg_match_all code now, I go with 8 patterns for the first and also 8 patterns for the second one to get the first string.
Note:
I just only run this script on console command, not in PHP application. So I don't pay any attention to the performance and it doesn't matter too.
Thank you for your help!
Edited
Thank you for the response. I tried both your regex, almost there. My question might confusing. I am not english speaker. I copy paste from regex101. It might be easier to understand, what I am trying to achieve.
https://regex101.com/r/uX5nqR/2
and this one too
https://regex101.com/r/Fxs7yY/1
Please check this out. I tried to extract translations from wordpress project and also twig file which using "trans" filter. I know there are mo po Editor, but the editor don't recognize the file extension I used.
I took the liberty of writing this in JavaScript, but the regex will work the same.
My complete code looks like this:
const r = /^_[e_]\((\"(.*)\"|\'(.*)\')(, (\"(.*)\"|\'(.*)\'))?\);$/;
const xs = [
"_e('string');",
"_e(\"string\");",
"_e('string', 'string2');",
"_e(\"string\", 'string2');",
"__('string');",
"__(\"string\");",
"__('string', 'string2');",
"__(\"string\", 'string2');",
];
xs.forEach((x) => {
const matches = x.match(r);
if(matches){
console.log('matches are:\n ', matches.filter(m => m !== undefined).join('\n '));
}else{
console.log('no matches for', x);
}
});
Now let me explain how the regex works and how I arrived at it:
First I noticed that all your strings start with _ and end with );,
so I knew the regex had to look something like ^…\);$.
Here ^ and $ mark the beginning and end of the string, and you should leave them out if they're not required.
After the initial _ you've got either another _ or a e, so we put these into a group followed by the opening parenthesis: [e_]\(.
Now we have a string that is either in " or in ', and we put it down as alternatives: (\"(.*)\"|\'(.*)\').
This string is repeated, but optionally, with a leading , in front.
So we get (, …)? for the optional part, and (\"(.*)\"|\'(.*)\') for the whole second portion.
For the second portion of your problem you can use the same strategy:
"string"|trans
'string'|trans
"string"|trans({}, "string2")
'string'|trans({}, 'string2')
'string'|trans({}, "string2")
"string"|trans({}, 'string2')
Start building up your regex from the similarities. We've got the same string pattern as before used twice, and the optional second part now looks like (\(\{\}, (\"(.*)\"|\'(.*)\')\))?.
This way we can end up with a regex like this:
^(\"(.*)\"|\'(.*)\')\|trans\(\{\}, (\"(.*)\"|\'(.*)\')\))?$
Please note that this regex is not tested, but just a guess from my side.
Upon further discussion it became apparent that we're looking at several matches in a larger bunch of text. To adapt to this we need to exclude the ' and " characters from the innermost groups, which leaves us with these regexes:
_[e_]\(("([^"]*)"|\'([^']*)\')(, ("([^"]*)"|\'([^']*)\'))?\);
(\"(.*)\"|\'(.*)\')\|trans(\(\{\}, (\"(.*)\"|\'(.*)\')\))?
I've also noted that my second regex apparently had an unmatched parenthesis in it.
I tried to understand the purpose of these regexes - here's what I think. (Let me omit the slashes on both sides, also the string quotes belonging to the language instead of the regex itself.)
(__|_e)\(\"(.*)\"
(__|_e)\(\'(.*)\'
This way you get all the hits of your 8 regexes above; but that's probably not what you were trying to achieve.
As far as I understand, you want to list the I18N refs in your code, with one or more arguments between the brackets. I think the best way to do it is run a preg_match_all with the simplest form of the pattern:
(__|_e)\(.*\)
or maybe this one is better:
(__|_e)\([^\)]+\) // works for multiple calls in one line, ignores empties
...and then iterate the results one by one and split them by comma:
foreach($matches as $m) {
$args = explode(",",$m[1]); // [1] = second subpattern
;
; // now you have the arguments of this function call
;
}
If this answer is not helping, let's refine the question :)

PHP how to get substring between certain keywords

I have a (strange) string like:
EREF+012345678901234MREF+ABCDEF01234567890123CRED+DE12ABC01234567890SVWZ+ABCEDFG HIJ 01234567890 123,45ABWA+ABCDEFGHIJKLMNOPQR
The pattern I need to look for can only be defined by keywords: EREF+, MREF+, CRED+ and others. I know there are 19 keywords, but the string may contain different subsets of these 19 keywords. I don't know if the order stays the same, from what I can tell EREF+ will most likely be the first keyword, but the order may as well differ. I also don't know which of the 19 keywords might be the last one in the string as that may change case by case.
My first approach was to just use explode() twice, with keyword 1 and keyword 2 – but if the keywords change order (and I cannot guarantee they don't) I would have to go through all possible combinations.
Anyway, here's the first (working) code I used:
<?php
$string = "EREF+012345678901234MREF+ABCDEF01234567890123CRED+DE12ABC01234567890SVWZ+ABCEDFG HIJ 01234567890 123,45ABWA+ABCDEFGHIJKLMNOPQR";
function getBetween($content,$start,$end){
$r = explode($start, $content);
if (isset($r[1])){
$r = explode($end, $r[1]);
return $start.$r[0];
}
return '';
}
$start = "EREF+";
$end = "MREF+";
$output = getBetween($string,$start,$end);
echo $output;
?>
So now I am looking into regex to come up with a solution that extracts a substring between two keywords, where any of the keywords can be the start delimiter while any other keyword may be the end delimiter.
Since there are literally thousands of regex questions around, I took some time and tried to adapt from other solutions, but no success until now. I must confess regex is voodoo to me and I cannot seem to remember the patterns for more than a minute. I found this thread which is pretty close to what I am trying to achieve, and tried a few tweaks but I cannot get it to work properly.
Here's my code so far:
<?php
$string = "EREF+012345678901234MREF+ABCDEF01234567890123CRED+DE12ABC01234567890SVWZ+ABCEDFG HIJ 01234567890 123,45ABWA+ABCDEFGHIJKLMNOPQR";
$matches = array();
$keywords = ['EREF+', 'MREF+', 'CRED+', 'SVWZ+', 'ABWA+'];
$pattern = sprintf('/(?:%s):(.*?)/', join('|', array_map(function($keyword) {
return preg_quote($keyword, '/');
}, $keywords)));
preg_match_all($pattern, $string, $matches);
print_r($matches);
?>
... whereas the constructed pattern looks like this:
/(?:EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+):(.*?)/
Can anyone advise please? Any help appreciated!
Thanks
You can use this regex:
/(?<=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+)(.+?)(?=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+|$)/
It will match the strings between defined keywords.
(?<=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+) # look backward for a keyword
(.+?) #Match any character, non greedy
(?=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+|$) # Look forward for a keyword or end of string
Regex101
Edit:
If you want to know what keyword caused the split you can use this regex:
/((?:EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+))(.+?)(?=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+|$)/
It will capture the first keyword and the text between keywords.
Live sample

preg_replace with Regex - find number-sequence in URL

I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)

using the letter h in regular expression not working?

I've been at this for hours...any help would be greatly appreciated.
I want to preg_match_all between strings, "Level:" and "<HR>"
everything works except when I add the letter h
Why?!?
<?php
include("courses.php");
preg_match_all('/(level:.*?)r>/i', $str, $matches);
// this works but picks up <br>, so i wanted to add in the letter h
preg_match_all('/(level:.*?)h/i', $str, $matches);
// i've tried changing it to `hr` but that fails, now, *even only* `h` fails
print_r($matches[1]);
?>
I've tried escaping the h, but can't figure out what's wrong with this letter.
String is:
$str='Level: <B>Undergraduate</B><BR>
Information Literacy Course: <B>N</B><BR>
Special Restriction: <B>None</B><BR>
<HR>';
// this repeats alot. I just wrote it out once, but it's all in the same variable like this.
I think you guys asking me for the string lit the lightbulb in my head. am i not accounting for line breaks??
Not sure why you think it would fail, but if you want the string in between those two terms:
preg_match_all('/Level:(.*?)<hr>/i', $str, $matches);
// $matches[1] contains the matches
If that doesn't work, perhaps your string has newlines in it, in which case you need the /s modifier to let . match newlines as well:
preg_match_all('/Level:(.*?)<hr>/is', $str, $matches);

Categories