Small modification to regex to make it work? - php

I am trying to get the 6 or 7 number sequence and put it in the urls array.
<a href="/product/view/4539922/" class="raw_clafd">
However I am having a problem with the regex below.
preg_match_all('/<a\s+href="\.\/view\/(\d{6,7})\/" class="raw_clafd">/', $str, $urls);
What am I missing? Thank you

You cannot match /product with \.
You can use:
preg_match_all('#<a\s+href="/product/view/(\d{6,7})/"\s+class="raw_clafd">#', $str, $urls);
But I really believe you should consider using DOM parser.

You can get the value after /view/ just by using
/\/view\/(\d{6,7})/

Related

preg_replace with Regex - find number-sequence in URL

I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)

How to extract hrefs from HTML with PHP

Assume I have a valid htmlfile which I save into a string. Now I want to extract the links of the anchor elements (hrefs). Therefore I want to use pure regular expressions.
preg_match_all('/<a [^>]*href="(.+)">/', $html, $match);
Usually I want to receive a string like that:
http://www.thisIsAHrefLinkIWantToHave.de
But instead I receive also the following string, logical caused by (.+) in the regex:
index?a=f">Link</a> Link 2 Link 3 Link 4 Link 5 Link 6 <a href="http://mysite.org
I found solutions like Xpath or DOMDocument (
PHP String Manipulation: Extract hrefs) But I'd like to have solution without those/any libraries, just with regex. What I have to do to solve the matter of my regex?
I thought about from first " to next " . But how to create that pattern or another pattern, which solves the problem?
[EDIT:] Solution
preg_match_all('/<a [^>]*href="([A-Za-z0-9\/?=:&_.]+)?"/', $html, $match);
Try preg_match_all('/<a [^>]*href="(.+)?">/', $html, $match);, the ? makes .* non-greedy.
Musa is correct in that the period (.) is greedy. try [A-Za-z0-9_]+ instead of .+

find url with regex on text

there are a lot of topics like this one but i don't know what the error i tried a lot
so this is the original text
onclick="NewWindow('http://google.com','name','800','600','yes');return false">
this is my code
$re1='(onclick)';
$re2='(=)';
$re3='(.)';
$re4='(NewWindow)';
$re5='(\\()';
$re6='(.)';
$re7='((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s"]*))';
$c=preg_match_all ("/".$re1.$re2.$re3.$re4.$re5.$re6.$re7."/is", $txt, $matches);
print_r($matches);
any one can help me to get the url using regular expression and php??
what is the wrong with this code?
Regards
preg_match("/NewWindow\('([^']*)'/",$txt, $matches);
matches[1] contains the url
is it what you need ?
(edit: put in code block because a parenthesis was not escaped correclty
This should work:
preg_match("/onclick=\"NewWindow\('(.*)','n/",$txt,$matches);
I'd use non-greedy matching for this:
preg_match("/onclick=\"NewWindow\('(.*?)'/", $txt, $matches);
Based on your description, the regex I would use, would be:
/(?<=NewWindow\(\').*(http://|https://)[^\'\"]*/i
or
/(?<=onclick=\"NewWindow\(\').*(http://|https://)[^\'\"]*/i
A great tool for testing your regex is: http://gskinner.com/RegExr/
It outputs just the url and only does so if it is preceded by "NewWindow('" in the first example or "onclick="NewWindow('", which means, in your case, 'http://google.com').

Regular expressions PHP preg_match_all

Hi I am trying to use preg_match_all() to extract the number in bold out of an image URL...
http://profile.ak.fbcdn.net/hprofile-ak-snc4/174844_39677118233_8277870_t.jpg
Could someone please help me with the regular expression needed as I am stumped.
I've used this so far:
preg_match_all("(http://profile.ak.fbcdn.net/hprofile-ak-snc4/.*_t.jpg)siU", $this->html, $matching_data);
return $matching_data[0];
}
Which is just giving me an array of the full links.
Hope someone can help, thanks!!!
This will give you all occurrences:
$matches = preg_match_all ('!/hprofile-ak-snc4/[0-9]+_([0-9]+)[^/]+?\.jpg!i', $txt);
print_r ($matches);
Number you have bolded should be contained in $matches[$n][3]...
preg_match_all("#http://profile\.ak\.fbcdn\.net/(.*?)/([0-9]+)_([0-9]+)_([0-9]+)_t\.jpg#is", $string, $matches);
print_r($matches);
Try this:
([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?
([a-z0-9\-._~%!$&'()*+,;=:#/]*)
(?:(?:\d+_)(\d+)(?:_\d+))\3
I've separated it out onto multiple lines for easier reading. You will want to use capture group 4
Or (just minimized it a bit)
(?:[a-z][a-z0-9+\-.]*:(?://[^/?#]+)?)?
([a-z0-9\-._~%!$&'()*+,;=:#/]*)
(?:(?:\d+_)(\d+)(?:_\d+))\1
and use capture group 2

PHP Regex to grab {tag}something{/tag}

I'm trying to come=up with a regex string to use with the PHP preg functions (preg_match, etc.) and am stumped on this:
How do you match this string?:
{area-1}some text and maybe a link.{/area-1}
I want to replace it with a different string using preg_replace.
So far I've been able to identify the first tag with preg_match like this:
preg_match("/\{(area-[0-9]*)\}/", $mystring);
Thanks if you can help!
If you don't have nested tags, something this simple should work:
preg_match_all("~{.+?}(.*?){/.+?}~", $mystring, $matches);
Your results can be then found in $matches[1].
I would suggest
preg_match_all("~\{(area-[0-9]*)\}(.*?)\{/\1\}~s", $mystring, $matches);
This will even work if other tags are nested inside the area tag you're looking at.
If you have several area tags nested within each other, it will still work, but you'll need to apply the regex several times (once for each level of nesting).
And of course, the contents of the matches will be in $matches[2], not $matches[1] as in Tatu's answer.

Categories