regex using preg-match on de limited based string - php

I got a string that goes like
Text58||INPUT 6~~Text67||INPUT 7~~Text68||INPUT 8~~CR_Exp_Date||INPUT 9~~Text60||INPUT 10~~Text63||INPUT 14~~Combo_Box65||Ship~~Text66||INPUT 15~~First_Name||INPUT 18~~Middle_Name||INPUT 19~~Last_Name||INPUT 20~~Suffix||INPUT 21~~Country||INPUT 22~~Mailing_Address||INPUT 23~~City||INPUT 24~~State||INPUT 25~~Zip_Code||INPUT 26~~
trying to extract First_Name||INPUT 18
tried doing (?=First_Name[||]).*?(?<=[~~][$])
didnt come up with anything else ...any ones what i am doing wrong ?

Try this:
First_Name\|\|.*?(?=~~)
First_Name should not be in a lookahead, you want to include it in the match.
To match a pair of |, you should include them in the regexp, with \ to escape them.
~~ should be in a positive lookahead, not lookbehind. And they don't need to be in brackets.
If you don't want to include First_Name|| in the match (why did you say you did in the question?), you can use a positive lookbehind:
(?<=First_Name\|\|).*?(?=~~)
It seems like you got lookbehind and lookahead backwards in your attempt.
DEMO

Related

Regex - Match characters but don't include within results

I have got the following Regex, which ALMOST works...
(?:^https?:\/\/)(?:www|[a-z]+)\.([^.]+)
I need the result to be the only result, or within the same position in the Array.
So for example this http://m.facebook.com/ matches perfect, there is only 1 group.
However, if I change it to http://facebook.com/ then I get com/in place of where Facebook should be. So I need to have (?:www|[a-z]+) as an optional check really.
Edit:
What I expect is just to match facebook, if ANY of the strings are as follows:
http://www.facebook.com
http://facebook.com
http://m.facebook.com
And obviously the https counterparts.
This is my Regex now
(?:^https?:\/\/)(?:www)?\.?([^.]+)
This is close, however it matches the m on when I try `http://m.facebook.com
https://regex101.com/r/GDapY5/1
So I need to have (?:www|[a-z]+) as an optional check really.
A ? at the end of a pattern is generally used for "optional" bits -- it means "match zero or one" of that thing, so your subpattern would be something like this:
(?:www|[a-z]+)?
If you're simply trying to get the second level domain, I wouldn't bother with regex, because you'll be constantly adjusting it to handle special cases you come across. Just split on dots and take the penultimate value:
$domain = array_reverse(explode('.', parse_url($str)['host']))[1];
Or:
$domain = array_reverse(explode('.', parse_url($str, PHP_URL_HOST)))[1];
Perhaps you could make the first m. part optional with (?:\w+\.)?.
Instead of a capturing group you could use \K to reset the starting point of the reported match.
Then match one or more word characters \w+ and use a positive lookahead to assert that what follows is a dot (?=\.)
For example:
^https?://(?:www)?(?:\w+\.)?\K\w+(?=\.)
Edit: Or you could match for m. or www. using an alternation:
^https?://(?:m\.|www\.)?\K\w+(?=\.)
Demo Php

Non greedy match does not work

I want to implement non greedy match using .*? pattern. However, I came across one sample string which shows, that non greedy match does not work. This is the code and the sample string:
preg_match_all('/\<w:t.*?\>\<w:p\>/', '<w:t xml:space="preserve"></w:t></w:r><w:r><w:rPr><w:b/></w:rPr><w:t xml:space="preserve">Text 1 </w:t></w:r><w:r><w:rPr><w:b/><w:u w:val="single"/><w:color w:val="ff0000"/></w:rPr><w:t xml:space="preserve"></w:t></w:r><w:r><w:rPr><w:b/><w:u w:val="single"/><w:color w:val="ff0000"/><w:i/></w:rPr><w:t xml:space="preserve">Text 2</w:t></w:r><w:r><w:t xml:space="preserve"></w:t></w:r><w:r><w:t xml:space="preserve"></w:t></w:r><w:r><w:t xml:space="preserve"></w:t></w:r></w:p></w:t></w:r></w:p><w:p w:rsidRDefault="004D3323" w:rsidP="003F03B1"><w:r><w:t><w:p>', $match);
But if I print_r the $match variable, I see that this pattern matches the whole string. However, what I want is to match only such strings as:
"<w:t><w:p>" and "<w:t any text may go here><w:p>"
So, what I did wrong and how can I fix it? Thanks!
Use this regex instead:
<w:t[^>]*><w:p>
[^>]* allows all characters except >
see https://regex101.com/r/nuMzTk/1

PHP: Regex: Match if doesnt contain

I have two urls (below). I need to match the one that doesn't contain the "story" part in the it. I know i need to use negative lookhead/behind, but i cant for the life of me get it to work
Urls
news/tech/story/2014/oct/28/apple-iphone/261736
news/tech/2014/oct/28/apple-iphone/261736
Current Regex
news\/([a-z0-9-\/]{1,255})(\d{4})\/(\w{3})\/(\d{2})\/([a-z0-9\-]{1,50})\/(\d{1,10})
Example:
http://regex101.com/r/jC7jC4/1
you can try this one :
news\/(([a-z0-9-\/](?!story)){1,255})(\d{4})\/(\w{3})\/(\d{2})\/([a-z0-9\-]{1,50})\/(\d{1,10})
You can use negative lookahead like this:
(?!.*\bstory\b)news\/([a-z0-9-\/]{1,255})(\d{4})\/(\w{3})\/(\d{2})\/([a-z0-9\-]{1,50})\/(\d{1,10})
RegEx Demo
(?!.*\bstory\b) is negative lookahead that will stop match if there is a word story in the URL.
You can check with strpos() if you don't have to use regex
if (strpos($url, 'story') === false

php regexp: can't exclude one element

I am trying to set-up a quite complex regexp, but I can't avoid just one element from not-match list.
My regular expression is:
1234567-8_abc((?!_ABC|_DEFGHI)[\w]?)*(\.ios|\.and)
What I have to exclude is:
1234567-8_abc.ios
1234567-8_abc_DEFGHI.ios
1234567-8_abc_ABC.ios
Instead, what I have to include is:
1234567-8_abc_1UP.ios
1234567-8_abc_FI.ios
1234567-8_abc_gmg.ios
1234567-8_abc_1UP.and
1234567-8_abc_FI.and
1234567-8_abc_gmg.and
1234567-8_abc_ddd.and
1234567-8_abc_qwert.ios
1234567-8_abc_88.ios
Well, I can't exclude the first option (1234567-8_abc.ios).
I tried it here.
How can I achieve this?
Thank you!
You can use this pattern:
1234567-8_abc_[^_.]++(?<!_ABC|_DEFGHI)\.(?:ios|and)
Note: I assume that each substring between _ and .ios doesn't contain a dot or an underscore.
The possessive quantifier ++ is necessary to fail faster with the less possible backtracking steps
This regex matches your examples in PHP:
1234567-8_abc_((?!ABC|DEFGHI)[\w]?)*(\.ios|\.and)
Add a negative lookahead like below,
1234567-8_abc(?!_ABC|_DEFGHI)\w+(\.ios|\.and)
DEMO
(?!_ABC|_DEFGHI) Negative lookahead asserts that the string following _abc wouldn't be _ABC or _DEFGHI . And it must have one or more word characters before .ios or .and. So it won't match this 1234567-8_abc.ios string.
1234567-8_abc(?:(?!_ABC|_DEFGHI)\w)+(\.ios|\.and)
Try this.Your regex has left \w after 1234567-8_abc optional.Just made it compulsary.See demo.
http://regex101.com/r/bB8jY7/1

match regex php between two string with string in middle

I would like to get a string made of one word with a delimiter word before and after it
i tried but doen t work
$stringData2 = file_get_contents('testtext3.txt');
$regular2=('/(?<=first del)*MAIN WORD(?=last del)*\s');
preg_match_all($regular2,
$stringData2,
$out, PREG_PATTERN_ORDER);
thank you very much for any help
No quantifier needed, add delimeter at end, put \s inside lookahead.
'/(?<=first del)MAIN WORD(?=last del\s)/'
This regex
(?<=xx)[^\s]*(?=yy)
matches hello in:
xxhelloyy
but fails to match in:
xxhello worldyy
This is probably what you're looking for.
If you want the delimiter string included in the match, then you should not be using lookahead or look or look behind. It should be something rather basic, like this.
/\s?first del MAIN WORD last del\s?/
If you do want to return JUST the MAIN WORD part of the match, then this will work.
/(?<=\s?first del)MAIN WORD(?=last del\s?)/
Put a 'i' at the very end of that to make it case insensitive, if you want. I only mention this, because in the example you gave me above has different case between the example text and the desired response.

Categories