I have two urls (below). I need to match the one that doesn't contain the "story" part in the it. I know i need to use negative lookhead/behind, but i cant for the life of me get it to work
Urls
news/tech/story/2014/oct/28/apple-iphone/261736
news/tech/2014/oct/28/apple-iphone/261736
Current Regex
news\/([a-z0-9-\/]{1,255})(\d{4})\/(\w{3})\/(\d{2})\/([a-z0-9\-]{1,50})\/(\d{1,10})
Example:
http://regex101.com/r/jC7jC4/1
you can try this one :
news\/(([a-z0-9-\/](?!story)){1,255})(\d{4})\/(\w{3})\/(\d{2})\/([a-z0-9\-]{1,50})\/(\d{1,10})
You can use negative lookahead like this:
(?!.*\bstory\b)news\/([a-z0-9-\/]{1,255})(\d{4})\/(\w{3})\/(\d{2})\/([a-z0-9\-]{1,50})\/(\d{1,10})
RegEx Demo
(?!.*\bstory\b) is negative lookahead that will stop match if there is a word story in the URL.
You can check with strpos() if you don't have to use regex
if (strpos($url, 'story') === false
Related
I want to implement non greedy match using .*? pattern. However, I came across one sample string which shows, that non greedy match does not work. This is the code and the sample string:
preg_match_all('/\<w:t.*?\>\<w:p\>/', '<w:t xml:space="preserve"></w:t></w:r><w:r><w:rPr><w:b/></w:rPr><w:t xml:space="preserve">Text 1 </w:t></w:r><w:r><w:rPr><w:b/><w:u w:val="single"/><w:color w:val="ff0000"/></w:rPr><w:t xml:space="preserve"></w:t></w:r><w:r><w:rPr><w:b/><w:u w:val="single"/><w:color w:val="ff0000"/><w:i/></w:rPr><w:t xml:space="preserve">Text 2</w:t></w:r><w:r><w:t xml:space="preserve"></w:t></w:r><w:r><w:t xml:space="preserve"></w:t></w:r><w:r><w:t xml:space="preserve"></w:t></w:r></w:p></w:t></w:r></w:p><w:p w:rsidRDefault="004D3323" w:rsidP="003F03B1"><w:r><w:t><w:p>', $match);
But if I print_r the $match variable, I see that this pattern matches the whole string. However, what I want is to match only such strings as:
"<w:t><w:p>" and "<w:t any text may go here><w:p>"
So, what I did wrong and how can I fix it? Thanks!
Use this regex instead:
<w:t[^>]*><w:p>
[^>]* allows all characters except >
see https://regex101.com/r/nuMzTk/1
I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)
Any idea why this regex not working?
'#((http://|www.)(.*^(youtube)\.(com|org|co.il|net|us|ws|info|tv|me|tk|co.uk).*))#'
I want to find links that don't belong to YouTube.
You have this segment in your regex: ^(youtube)
Please understand that this will not negate string youtube.
You will need negative lookahead or lookbehind to negate matching a text like this:
(?!youtube)
OR
(?<!youtube)
For your regex it can be:
#((http://|www.)((?!youtube)[^.]+\.(com|org|co.il|net|us|ws|info|tv|me|tk|co.uk).*))#
However you need to provide sample of your input strings within your question.
Read about lookarounds here: http://www.regular-expressions.info/lookaround.html
I'd go the opposite way
$youtubePattern = '%(https?:\/\/(www\.)?youtube\.([a-z]{2,3}).*)%six';
if (!preg_match($youtubePattern, $subject)) {
// not youtube, do whatever...
}
I have a regex that's matching urls and converting them into html links.
If the url is already part of a link I don't want to to match, for example:
http://stackoverflow.com/questions/ask
Should match, but:
Stackoverflow
Shouldn't match
How can I create a regex to do this?
If your url matching regular expression is $URL then you can use the following pattern
(?<!href[\"'])$URL
In PHP you'd write
preg_match("/(?<!href[\"'])$URL/", $text, $matches);
You can use a negative lookbehind to assert that the url is not preceded by href="
(?<!href=")
(Your url-matching pattern should go immediately after that.)
This link provides information. The accepted solution is like so:
<a\s
(?:(?!href=|target=|>).)*
href="http://
(?:(?!target=|>).)*
By removing the references to "target" this should work for you.
Try this
/(?:(([^">']+|^)https?\:\/\/[^\s]+))/m
I have a PHP regular expression I'm using to get the YouTube video code out of a URL.
I'd love to match this with a client-side regular expression in JavaScript. Can anyone tell me how to convert the following PHP regex to JavaScript?
preg_match("#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=v\/)[^&\n]+(?=\?)|(?<=embed/)[^&\n]+|(?<=v=)[^&\n]+|(?<=youtu.be/)[^&\n]+#", $url, $matches);
Much appreciated, thanks!
I think the only problem is to get rid of the lookbehind assertions (?<=...), they are not supported in Javascript.
The advantage of them is, you can use them to ensure that a pattern is before something, but they are NOT included in the match.
So, you need to remove them, means change (?<=v=)[a-zA-Z0-9-]+(?=&) to v=[a-zA-Z0-9-]+(?=&), but now your match starts with "v=".
If you just need to validate and don't need the matched part, then its fine, you are done.
But if you need the part after v= then put instead the needed pattern into a capturing group and continue working with those captured values.
v=([a-zA-Z0-9-]+)(?=&)
You will then find the matched substring in $1 for the first group, $2 for the second, $3 ...
you can replace your look behind assertion using this post
Javascript: negative lookbehind equivalent?