Regular expression - delimiter problem in PHP - php

I'm trying to extract src attributes from: [attname src="http://example.org"] somecontent [attname src="http://www.example.com"]
What I have now:
preg_match_all('#attname src=".*[^"]#', $buffer, $bufferarr);
However it doesn't work - there's no stop after second ", what results in: attname src="http://example.org"] somecontent [attname src="http://www.example.com

By default, + and * are "greedy" - they gobble up as many characters as they can. That's why you get more than you want. If you add ? to them (+? and *?) they will be non-greedy and will stop as soon as they can.
You regexp also looks wrong. It should be something like #attname src="[^"]*?"#.

preg_match_all('#attname src="([^"]*)"#', $buffer, $bufferarr);

Not the best solution but anyways it get's the job done :
$str = '[attname src="http://example.org"] somecontent [attname src="http://www.example.com"]';
preg_match_all('/attname src=\"(.*?)\"/', $str, $match);
var_dump($match);

Related

preg_replace with Regex - find number-sequence in URL

I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)

Find and replace "[" and,or "]" using regular expression in PHP

I'm searching domains from emails inside texts, email's "format" is like: [mailto:name#domain.com] and I'm finding them using this basic regular expression:
$r = '/mailto:.*\]/';
then I'm aplying this:
substr(strrchr($matches[1][0], "#"), 1);
and final result is something like
domain.com]
So the question is, how to get rid of "]" or a better way to get only a domain from an email inside [mailto:name#domain.com] any sugestion? Thanks in advance!
Thanks!
You can try lookahead in the regex:
$r = '/mailto:.*(?=\])/';
or just remove it from the result using trim:
$final = substr(strrchr($matches[1][0], "#"),1).trim("]");
And btw, you can just use lookbehind, so you don't need to use the substr:
$r = '/(?<=\[mailto:[^#]*#).*(?=\])/';
Change your expression to this.
$r = '/mailto:[^#]+#[^]]+/';
You can do this without using substr and a basic regular expression.
preg_match_all('/\[mailto:[^#]+#([^#]+)\]/', $str, $matches);
print_r($matches[1]);
See working demo
Use rtrim
'domain.com]'.rtrim("]");
Or you can just try to extract the possible elements from the string:
([\\w-+]+(?:\\.[\\w-+]+)*#(?:[\\w-]+\\.)+[a-zA-Z]{2,7})

find url with regex on text

there are a lot of topics like this one but i don't know what the error i tried a lot
so this is the original text
onclick="NewWindow('http://google.com','name','800','600','yes');return false">
this is my code
$re1='(onclick)';
$re2='(=)';
$re3='(.)';
$re4='(NewWindow)';
$re5='(\\()';
$re6='(.)';
$re7='((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s"]*))';
$c=preg_match_all ("/".$re1.$re2.$re3.$re4.$re5.$re6.$re7."/is", $txt, $matches);
print_r($matches);
any one can help me to get the url using regular expression and php??
what is the wrong with this code?
Regards
preg_match("/NewWindow\('([^']*)'/",$txt, $matches);
matches[1] contains the url
is it what you need ?
(edit: put in code block because a parenthesis was not escaped correclty
This should work:
preg_match("/onclick=\"NewWindow\('(.*)','n/",$txt,$matches);
I'd use non-greedy matching for this:
preg_match("/onclick=\"NewWindow\('(.*?)'/", $txt, $matches);
Based on your description, the regex I would use, would be:
/(?<=NewWindow\(\').*(http://|https://)[^\'\"]*/i
or
/(?<=onclick=\"NewWindow\(\').*(http://|https://)[^\'\"]*/i
A great tool for testing your regex is: http://gskinner.com/RegExr/
It outputs just the url and only does so if it is preceded by "NewWindow('" in the first example or "onclick="NewWindow('", which means, in your case, 'http://google.com').

PHP Regex: Select all except last occurrence

I'm trying to replace all \n's sans that final one with \n\t in order to nicely indent for a recursive function.
This
that
then
thar
these
them
should become:
This
that
then
thar
these
them
This is what I have: preg_replace('/\n(.+?)\n/','\n\t$1\n',$var);
It currently spits this out:
This
that
then
thar
these
them
Quick Overview:
Need to indent every line less the first and last line using regex, how can I accomplish this?
You can use a lookahead:
$var = preg_replace('/\n(?=.*?\n)/', "\n\t", $var);
See it working here: ideone
After fixing a quotes issue, your output is actually like this:
This
that
then
thar
these
them
Use a positive lookahead to stop that trailing \n from getting eaten by the search regex. Your "cursor" was already set beyond it so only every other line was being rewritten; your match "zones" overlapped.
echo preg_replace('/\n(.+?)(?=\n)/', "\n\t$1", $input);
// newline-^ ^-text ^-lookahead ^- replacement
Live demo.
preg_replace('/\n(.+?)(?=\n)/',"\n\t$1",$var);
Modified the second \n to be the lookahead (?=\n), otherwise you'd run into issues with regex not recognizing overlapping matches.
http://ideone.com/1JHGY
Let the downwoting begin, but why use regex for this?
<?php
$e = explode("\n",$oldstr);
$str = $e[count($e) - 1];
unset($e[count($e) - 1]);
$str = implode("\n\t",$e)."\n".$str;
echo $str;
?>
Actually, str_replace has a "count" parameter, but I just can't seem to get it to work with php 5.3.0 (found a bug report). This should work:
<?php
$count = substr_count($oldstr,"\n") - 1;
$newstr = str_replace("\n","\n\t",$oldstr,&$count);
?>

Simple RegEx PHP

Since I am completely useless at regex and this has been bugging me for the past half an hour, I think I'll post this up here as it's probably quite simple.
hey.exe
hey2.dll
pomp.jpg
In PHP I need to extract what's between the <a> tags example:
hey.exe
hey2.dll
pomp.jpg
Avoid using '.*' even if you make it ungreedy, until you have some more practice with RegEx. I think a good solution for you would be:
'/<a[^>]+>([^<]+)<\/a>/i'
Note the '/' delimiters - you must use the preg suite of regex functions in PHP. It would look like this:
preg_match_all($pattern, $string, $matches);
// matches get stored in '$matches' variable as an array
// matches in between the <a></a> tags will be in $matches[1]
print_r($matches);
This appears to work:
$pattern = '/<a.*?>(.*?)<\/a>/';
([^<]*)
I found this regular expression tester to be helpful.
Here is a very simple one:
<a.*>(.*)</a>
However, you should be careful if you have several matches in the same line, e.g.
hey.exehey2.dll
In this case, the correct regex would be:
<a.*?>(.*?)</a>
Note the '?' after the '*' quantifier. By default, quantifiers are greedy, which means they eat as much characters as they can (meaning they would return only "hey2.dll" in this example). By appending a quotation mark, you make them ungreedy, which should better fit your needs.

Categories