I have a regex that's matching urls and converting them into html links.
If the url is already part of a link I don't want to to match, for example:
http://stackoverflow.com/questions/ask
Should match, but:
Stackoverflow
Shouldn't match
How can I create a regex to do this?
If your url matching regular expression is $URL then you can use the following pattern
(?<!href[\"'])$URL
In PHP you'd write
preg_match("/(?<!href[\"'])$URL/", $text, $matches);
You can use a negative lookbehind to assert that the url is not preceded by href="
(?<!href=")
(Your url-matching pattern should go immediately after that.)
This link provides information. The accepted solution is like so:
<a\s
(?:(?!href=|target=|>).)*
href="http://
(?:(?!target=|>).)*
By removing the references to "target" this should work for you.
Try this
/(?:(([^">']+|^)https?\:\/\/[^\s]+))/m
Related
I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)
I have two urls (below). I need to match the one that doesn't contain the "story" part in the it. I know i need to use negative lookhead/behind, but i cant for the life of me get it to work
Urls
news/tech/story/2014/oct/28/apple-iphone/261736
news/tech/2014/oct/28/apple-iphone/261736
Current Regex
news\/([a-z0-9-\/]{1,255})(\d{4})\/(\w{3})\/(\d{2})\/([a-z0-9\-]{1,50})\/(\d{1,10})
Example:
http://regex101.com/r/jC7jC4/1
you can try this one :
news\/(([a-z0-9-\/](?!story)){1,255})(\d{4})\/(\w{3})\/(\d{2})\/([a-z0-9\-]{1,50})\/(\d{1,10})
You can use negative lookahead like this:
(?!.*\bstory\b)news\/([a-z0-9-\/]{1,255})(\d{4})\/(\w{3})\/(\d{2})\/([a-z0-9\-]{1,50})\/(\d{1,10})
RegEx Demo
(?!.*\bstory\b) is negative lookahead that will stop match if there is a word story in the URL.
You can check with strpos() if you don't have to use regex
if (strpos($url, 'story') === false
Any idea why this regex not working?
'#((http://|www.)(.*^(youtube)\.(com|org|co.il|net|us|ws|info|tv|me|tk|co.uk).*))#'
I want to find links that don't belong to YouTube.
You have this segment in your regex: ^(youtube)
Please understand that this will not negate string youtube.
You will need negative lookahead or lookbehind to negate matching a text like this:
(?!youtube)
OR
(?<!youtube)
For your regex it can be:
#((http://|www.)((?!youtube)[^.]+\.(com|org|co.il|net|us|ws|info|tv|me|tk|co.uk).*))#
However you need to provide sample of your input strings within your question.
Read about lookarounds here: http://www.regular-expressions.info/lookaround.html
I'd go the opposite way
$youtubePattern = '%(https?:\/\/(www\.)?youtube\.([a-z]{2,3}).*)%six';
if (!preg_match($youtubePattern, $subject)) {
// not youtube, do whatever...
}
I'm having a little problem with my Regex
I've made a custom BBcode for my website, however I also want URLs to be parsed too.
I'm using preg_replace and this is the pattern used to identify URLS:
/([\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/])/is
Which works great, however if a URL is within a [img][/img] block, the above pattern also picks it up and produces a result like this:
//[img]http://url.com/toimg.jeg[/img] will produce this result:
<img src="<a href="http://url.com/toimg.jeg" target="_blank">/>
//When it should produce:
<img src="http://url.com/toimg.jeg"/>
I tried using this:
/([^"][\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/][^"])/is
With no luck.
Any help will be appreciated.
Edit:
For solution See the 2nd comment on stema's answer.
Try this
(?<!href=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/])
See it here on Regexr
To make it more general you can simplify your lookbehind to check only for "=""
(?<!=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/])
See it on Regexr
(?<!href=") is a negative lookbehind assertion, it ensures that there is no "href="" before your pattern.
\b is a word boundary that anchors the start of your link to a change from a non word to a word character. without this the lookbehind would be useless and it would match from the "ttp://..." on.
I have a PHP regular expression I'm using to get the YouTube video code out of a URL.
I'd love to match this with a client-side regular expression in JavaScript. Can anyone tell me how to convert the following PHP regex to JavaScript?
preg_match("#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=v\/)[^&\n]+(?=\?)|(?<=embed/)[^&\n]+|(?<=v=)[^&\n]+|(?<=youtu.be/)[^&\n]+#", $url, $matches);
Much appreciated, thanks!
I think the only problem is to get rid of the lookbehind assertions (?<=...), they are not supported in Javascript.
The advantage of them is, you can use them to ensure that a pattern is before something, but they are NOT included in the match.
So, you need to remove them, means change (?<=v=)[a-zA-Z0-9-]+(?=&) to v=[a-zA-Z0-9-]+(?=&), but now your match starts with "v=".
If you just need to validate and don't need the matched part, then its fine, you are done.
But if you need the part after v= then put instead the needed pattern into a capturing group and continue working with those captured values.
v=([a-zA-Z0-9-]+)(?=&)
You will then find the matched substring in $1 for the first group, $2 for the second, $3 ...
you can replace your look behind assertion using this post
Javascript: negative lookbehind equivalent?