I've this function that parses some content to retrieve homemade link tag and convert it to normal link tag.
Possible input:
<p>blabalblahhh <moolinkx pageid="121">text to click</moolinkx> blablabah</p>
Output :
<p>blabalblahhh text to click blablabah</p>
Here is my code:
$regex = '/\<moolinkx pageid="(.{1,})"\>(.{1,})\<\/moolinkx\>/';
preg_match_all( $regex, $string, $matches );
It works perfectly well if there is only one in the string. But as soon as there is a second one, it doesn't work.
Input:
<p>blabalblahhh <moolinkx pageid="121">text to click</moolinkx> blablabah.</p>
<p>Another <moolinkx pageid="128">text to clickclick</moolinkx> again blablablah.</p>
That's what I got when I print_r($matches):
Array
(
[0] => Array
(
[0] => <moolinkx pageid="121">text to click</moolinkx> blablabah.</p><p>Another <moolinkx pageid="128">text to clickclick</moolinkx>
)
[1] => Array
(
[0] => 121">text to click</moolinkx> blablabah.</p><p>Another <moolinkx pageid="128
)
[2] => Array
(
[0] => text to clickclick
)
)
I'm not at ease with regex, so it must be something very trivial... but I can't pinpoint what it is :(
Thank you very much in advance!
NB: This is my first post here, though I've been using this terrific Q&A for ages!
Use a negative Regex:
$regex = '/<moolinkx pageid="([^"]+)">([^<]+)<\/moolinkx>/';
Explained demo here: http://regex101.com/r/sI3wK5
You are using a greedy selector, which is recognising everything between the first openning tag and the last closing tag as the content between the tags. Change your regex to:
$regex = '/\<moolinkx pageid="(.+?)"\>(.+?)\<\/moolinkx\>/';
preg_match_all( $regex, $string, $matches );
Notice the .{1,} has changed to .+?. The + means one or more instances, and the ? tells the regex to select the fewest characters it can to fulfil the expression.
Related
I have several source codes that I'm applying preg_match_all on.
this is what I tried:
$lazy = file_get_contents("Some_Source_code.txt");
if(!preg_match("#method_(.*)\(int var0, int var1, int var2\)#", $lazy, $function_name))
die("nothing here");
preg_match_all("#method_".$function_name[1]."\(.*\){1}#", $lazy, $matches);
print_r($matches);
but the output comes like this:
Array
(
[0] => Array
(
[0] => method_2393(int var0, int var1, int var2)
[1] => method_2393(0, 0, 0)).equals(this.field_1351.getText().toString()))
)
)
ok, what I want is $matches[0][1]. But
How can I stop it once it detects the closing parentheses ' ) ' just like the first one.
I can process the line after I extract it, but how can I do it with regex?
I searched the answers of similar problems but they were too specific.
Modify the regex as
#method_".$function_name[1]."\([^)]*\){1}#
Where you got wrong
#method_".$function_name[1]."\(.*\){1}#
here you used \(.*\) where .* would match anything including the )
Changes made
\([^)]*\) here [^)]* it matches anything other than ) so that it ends with the first occurence of the )
You can also use a lazy matching using .*? instead of .* which is gready and consumes as much as characters as it can
I need help to find out the strings from a text which starts with # and till the next immediate space by preg_match in php
Ex : I want to get #string from this line as separate.
In this example, I need to extract "#string" alone from this line.
Could any body help me to find out the solutions for this.
Thanks in advance!
PHP and Python are not the same in regard to searches. If you've already used a function like strip_tags on your capture, then something like this might work better than the Python example provided in one of the other answers since we can also use look-around assertions.
<?php
$string = <<<EOT
I want to get #string from this line as separate.
In this example, I need to extract "#string" alone from this line.
#maybe the username is at the front.
Or it could be at the end #whynot, right!
dog#cat.com would be an e-mail address and should not match.
EOT;
echo $string."<br>";
preg_match_all('~(?<=[\s])#[^\s.,!?]+~',$string,$matches);
print_r($matches);
?>
Output results
Array
(
[0] => Array
(
[0] => #string
[1] => #maybe
[2] => #whynot
)
)
Update
If you're pulling straight from the HTML stream itself, looking at the Twitter HTML it's formatted like this however:
<s>#</s><b>UserName</b>
So to match a username from the html stream you would match with the following:
<?php
$string = <<<EOT
<s>#</s><b>Nancy</b> what are you on about?
I want to get <s>#</s><b>string</b> from this line as separate. In this example, I need to extract "#string" alone from this line.
<s>#</s><b>maybe</b> the username is at the front.
Or it could be at the end <s>#</s><b>WhyNot</b>, right!
dog#cat.com would be an e-mail address and should not match.
EOT;
$matchpattern = '~(<s>(#)</s><b\>([^<]+)</b>)~';
preg_match_all($matchpattern,$string,$matches);
$users = array();
foreach ($matches[0] as $username){
$cleanUsername = strip_tags($username);
$users[]=$cleanUsername;
}
print_r($users);
Output
Array
(
[0] => #Nancy
[1] => #string
[2] => #maybe
[3] => #WhyNot
)
Just do simply:
preg_match('/#\S+/', $string, $matches);
The result is in $matches[0]
I'm trying to get the 2 values in this string using regex:
a:2:{i:45;s:29:"Program Name 1";i:590;s:19:"Program Name 2";}
There are 2 variables that start with "s:" and end with ":" which I am attempting to get from this string (and similar strings.
$string = 'a:2:{i:45;s:29:"Program Name 1";i:590;s:19:"Program Name 2";}';
preg_match_all("/s:(\d+):/si", $page['perfarray'], $match);
print_r($match);
I have tried numerous things but this is the first time I've attempted to use regex to get multiple values from a string.
This is the current result: Array ( [0] => Array ( ) [1] => Array ( ) )
Any constructive help is greatly appreciated. I have already read the functions on php.net and I can't find a similar question on stack overflow that matches my needs closely enough. Thanks in advance.
That looks like a serialized string. Instead of using a regular expression, use unserialize() to retrieve the required value.
Update: It looks like your string is not a valid serialized string. In that case, you can use a regular expression to get the job done:
$string = 'a:2:{i:45;s:29:"Program Name 1";i:590;s:19:"Program Name 2";}';
if(preg_match_all("/s:(\d+):/si", $string, $matches)) {
print_r($matches[1]);
}
Output:
Array
(
[0] => 29
[1] => 19
)
That should work:
preg_match_all("/s:([0-9]+):/si", $page['perfarray'], $match);
How can I match both (http://[^"]+)'s?:
(I know it's an illegal URL, but same idea)
I want the regex to give me these two matches:
1 http://yoursite.com/goto/http://aredirectURL.com/extraqueries
2 http://aredirectURL.com/extraqueries
Without running multiple preg_match_all's
Really stumped, thanks for any light you can shed.
This regular expression will get you the output you want: ((?:http://[^"]+)(http://[^"]+)). Note the usage of the non-capturing group (?:regex). To read more about non-capturing groups, see Regular Expression Advanced Syntax Reference.
<?php
preg_match_all(
'((?:http://[^"]+)(http://[^"]+))',
'',
$out);
echo "<pre>";
print_r($out);
echo "</pre>";
?>
The above code outputs the following:
Array
(
[0] => Array
(
[0] => http://yoursite.com/goto/http://aredirectURL.com/extraqueries
)
[1] => Array
(
[0] => http://aredirectURL.com/extraqueries
)
)
you can split the string with this function:
http://de.php.net/preg_split
each part can contain e.g. one of the urls in the array given in the result.
if there is more content maybe call the preg_split using a callback operation while your full text is "worked" on.
$str = '';
preg_match("/\"(http:\/\/.*?)(http:\/\/.*?)\"/i", $str, $match);
echo "{$match[0]}{$match[1]}\n";
echo "{$match[1]}\n";
Content of 1.txt:
Image" href="images/product_images/original_images/9961_1.jpg" rel="disable-zoom:false; disable-expand: false"><img src="im
Code that does not work:
<?php
$pattern = '/(images\/product_images\/original_images\/)(.*)(\.jpg)/i';
$result = file_get_contents("1.txt");
preg_match($pattern,$result,$match);
echo "<h3>Preg_match Pattern test:</h3><br><br><pre>";
print_r($match);
echo "</pre>";
?>
I expect this result:
Array
(
[0] => images/product_images/original_images/9961_1.jpg
[1] => images/product_images/original_images/
[2] => 9961_1
[3] => .jpg
)
But i take this-like:
Array
(
[0] => images/product_images/original_images/9961_1.jpg" rel="disable-zoom:false; disable-expand: false">
[1] => images/product_images/original_images/
[2] => 9961_1.jpg" rel="disable-zoom:false; disable-expand: false">
)
I'n tired of trying from a million combinations of this regexp. I dunno what's wrong. Please and thanks a lot!
Make it ungreedy:
$pattern = '/(images\/product_images\/original_images\/)(.*?)(\.jpg)/i';
Remember that Regular Expressions are greedy. Your second capture (.*) says to match any character except the new line (unless in mutliline mode). So it is probably capturing the rest of the line.
You can make it ungreedy as suggested by Wrikken. But I like to ensure I am capturing what I want. In your case, it looks like the value of the href attribute. So really I want at least 1 character, can't be a quote, followed by the jpg extension:
$pattern = '/(images\/product_images\/original_images\/)([^'"]+)(\.jpg)/i';
Here's the basic regex:
href="((.*/)(.*?)(.jpg))"
Do not parse HTML with regex.
Do not parse HTML with regex.
Do not parse HTML with regex.