find url with regex on text - php

there are a lot of topics like this one but i don't know what the error i tried a lot
so this is the original text
onclick="NewWindow('http://google.com','name','800','600','yes');return false">
this is my code
$re1='(onclick)';
$re2='(=)';
$re3='(.)';
$re4='(NewWindow)';
$re5='(\\()';
$re6='(.)';
$re7='((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s"]*))';
$c=preg_match_all ("/".$re1.$re2.$re3.$re4.$re5.$re6.$re7."/is", $txt, $matches);
print_r($matches);
any one can help me to get the url using regular expression and php??
what is the wrong with this code?
Regards

preg_match("/NewWindow\('([^']*)'/",$txt, $matches);
matches[1] contains the url
is it what you need ?
(edit: put in code block because a parenthesis was not escaped correclty

This should work:
preg_match("/onclick=\"NewWindow\('(.*)','n/",$txt,$matches);

I'd use non-greedy matching for this:
preg_match("/onclick=\"NewWindow\('(.*?)'/", $txt, $matches);

Based on your description, the regex I would use, would be:
/(?<=NewWindow\(\').*(http://|https://)[^\'\"]*/i
or
/(?<=onclick=\"NewWindow\(\').*(http://|https://)[^\'\"]*/i
A great tool for testing your regex is: http://gskinner.com/RegExr/
It outputs just the url and only does so if it is preceded by "NewWindow('" in the first example or "onclick="NewWindow('", which means, in your case, 'http://google.com').

Related

preg_replace with Regex - find number-sequence in URL

I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)

Small modification to regex to make it work?

I am trying to get the 6 or 7 number sequence and put it in the urls array.
<a href="/product/view/4539922/" class="raw_clafd">
However I am having a problem with the regex below.
preg_match_all('/<a\s+href="\.\/view\/(\d{6,7})\/" class="raw_clafd">/', $str, $urls);
What am I missing? Thank you
You cannot match /product with \.
You can use:
preg_match_all('#<a\s+href="/product/view/(\d{6,7})/"\s+class="raw_clafd">#', $str, $urls);
But I really believe you should consider using DOM parser.
You can get the value after /view/ just by using
/\/view\/(\d{6,7})/

Regex match if not after word

I have a regex that's matching urls and converting them into html links.
If the url is already part of a link I don't want to to match, for example:
http://stackoverflow.com/questions/ask
Should match, but:
Stackoverflow
Shouldn't match
How can I create a regex to do this?
If your url matching regular expression is $URL then you can use the following pattern
(?<!href[\"'])$URL
In PHP you'd write
preg_match("/(?<!href[\"'])$URL/", $text, $matches);
You can use a negative lookbehind to assert that the url is not preceded by href="
(?<!href=")
(Your url-matching pattern should go immediately after that.)
This link provides information. The accepted solution is like so:
<a\s
(?:(?!href=|target=|>).)*
href="http://
(?:(?!target=|>).)*
By removing the references to "target" this should work for you.
Try this
/(?:(([^">']+|^)https?\:\/\/[^\s]+))/m

Regular expression - need to get string from html comment

I need to get string from comment in HTML file, I was trying to do it with DOM, but I didn't find good solution with this method.
So I want to try it with regular expressions, but I can't find satisfactory solution. Please, can you help me?
This is what I need:
<!--adress-"String here I need to get"-->
Thanks in advance for answer
Look into $matches after this code
preg_match('~<!--adress-"(.*?)"-->~msi', $string, $matches);
HTML comments are regular; you can just match <!--adress-"([^">]+)"--> and get the first group.
This assumes that the comments are always well-formed and always have a quoted string containing no quotes.
It will be more accurate:
$regex = '<!--(.+?)-"{0,1}(.+?)"{0,1}-->';
preg_match_all($regex, $html, $matches_array);
Just do the var_dump($matches_array) and see results.

Simple RegEx PHP

Since I am completely useless at regex and this has been bugging me for the past half an hour, I think I'll post this up here as it's probably quite simple.
hey.exe
hey2.dll
pomp.jpg
In PHP I need to extract what's between the <a> tags example:
hey.exe
hey2.dll
pomp.jpg
Avoid using '.*' even if you make it ungreedy, until you have some more practice with RegEx. I think a good solution for you would be:
'/<a[^>]+>([^<]+)<\/a>/i'
Note the '/' delimiters - you must use the preg suite of regex functions in PHP. It would look like this:
preg_match_all($pattern, $string, $matches);
// matches get stored in '$matches' variable as an array
// matches in between the <a></a> tags will be in $matches[1]
print_r($matches);
This appears to work:
$pattern = '/<a.*?>(.*?)<\/a>/';
([^<]*)
I found this regular expression tester to be helpful.
Here is a very simple one:
<a.*>(.*)</a>
However, you should be careful if you have several matches in the same line, e.g.
hey.exehey2.dll
In this case, the correct regex would be:
<a.*?>(.*?)</a>
Note the '?' after the '*' quantifier. By default, quantifiers are greedy, which means they eat as much characters as they can (meaning they would return only "hey2.dll" in this example). By appending a quotation mark, you make them ungreedy, which should better fit your needs.

Categories