how to fix preg_match_all in php? - php

This the code I want to get text from
<tr><td valign="top"><img src="/icons/back.gif" alt="[PARENTDIR]"></td><td>Parent Directory</td><td> </td><td align="right"> - </td><td> </td></tr>
i used it like this:
#preg_match_all("/<td><a href=\"(.*)\">/',$text,$ress);
but the what I get is this
/administrator/components/com_simplephotogallery/x/">Parent Directory</a></td><td> </td><td align="right

Instead of the greedy (.*), use a negated character class [^"]:
preg_match_all('/<td><a href="([^"]+)">/', $text, $ress);

Try this regex instead
#preg_match_all("/<td><a href=\"(([\/\w \.-]*)*\/?)\">/',$text,$ress);

i just used
$a = explod('<a herf="',$text);
$a = explod('">',$a[1])
or anyone can just use DOMDocument class

Related

preg_replace doesn't work as expected

I have problem with preg_replace, I need it to replace <td class="td_supltrid_3" width="11%"><p> 4A8</p> with only 4A8. When I use this pattern:
'/\<td class\=\"td_supltrid_3\" width\=\"11%\"\>\<p\> ...\<\/p\>/'
it doesn't find it. However, when I use preg_match, it finds searched expression without problem. Can you tell me there is the problem? Whole code:
preg_replace('/\<td class\=\"td_supltrid_3\" width\=\"11%\"\>\<p\> (...)\<\/p\>/', '$1', $str)
You need to change (...) to (.*?), which will grab everything up to the trailing </p>
<?php echo preg_replace('/<td class\=\"td_supltrid_3\" width\=\"11%\"><p>(.*?)<\/p>/', '$1', '<td class="td_supltrid_3" width="11%"><p> 4A8</p>'); ?>

Can I use conditional logic in PHP EOD?

Is it possible to put conditional logic inside an EOD string?
$str = <<<EOD
<table>
<tr>
<td>
if ( !empty($var1) ) {
{$var1}
} else {
{$var2}
}
</td>
</tr>
</table>
This doesn't work for me, and it sort of looks like it wouldn't work, but I thought I'd take a stab.
Also, is it EOD or EOT? Both seem to work.
No. You cannot use conditionals in heredoc.
Also, is it EOD or EOT?
As long as your beginning and ending strings match you can use anything:
$x = <<<THOMAS
Pick a string, any string
THOMAS;
The doc contains several examples demonstrating this
As to how best to achieve the example you provided, this would be my first inclination:
$td = !empty($var1) ? $var1 : $var2;
$str = <<<EOD
<table>
<tr>
<td>
{$td}
</td>
</tr>
</table>
EOD;

Preg Pattern Finding

I want to parse a html content that have something like this:
<td class="s3_40">12,909</td>
i tried many regex to find string in between (12,909)
like : %<td class=\"s3_40\">(.*)</td>%
but i didn't find that .
Hope this can help:
$str = '<td class="s3_40">12,909</td>';
if (preg_match('#<td class="s3_40">(.*)</td>#s', $str, $matches)) {
var_dump($matches[1]);
}
If you must do it with a regex. You can try a pattern like this:
/<td[^>]*>(.*?)<\/td>/
Try this
#(?<=(s3_40">))([^<]+)(?=(</td>))#
Try doing it like this, just keep adding to s3_41, s3_42, etc to the class you need.
<?php
$s = '<td class="s3_40">12,909</td>';
$pat = '#<td class=".*(s3_40|s3_41).*".*?>(.*?)</td>#isU';
preg_match_all($pat,$s,$ms);
var_dump($ms);

Extract content from each first TD in a Table

I've got some HTML that looks like this:
<tr class="row-even">
<td align="center">abcde</td>
<td align="center"><img src="../images/delete_x.gif" alt="Delete User" border="none" /></td>
</tr>
<tr class="row-odd">
<td align="center">efgh</td>
<td align="center"><img src="../images/delete_x.gif" alt="Delete User" border="none" /></td>
</tr>
<tr class="row-even">
<td align="center">ijkl</td>
<td align="center"><img src="../images/delete_x.gif" alt="Delete User" border="none" /></td>
</tr>
And I need to retrieve the values, abcde, efgh, and ijkl
This is the regex I'm currently using:
preg_match_all('/(<tr class="row-even">|<tr class="row-odd">)<td align="center">(.*)<\/td><\/tr>/xs', $html, $matches);
Yes, I'm not very good at them. As with most of my regex attempts, this is not working. Can anyone tell me why?
Also, I know about html/xml parsers, but it would require a significant code revisit to make that happen. So that's for later. We need to stick with regex for now.
EDIT: To clarify, I need the values between the first <td align="center"></td> tag after either <tr class="row-even"> or <tr class="row-odd">
~<tr class="row-(even|odd)">\s*<td align="center">(.*?)</td>~m
Notice the m modifier and the use of \s*.
Also, you can make the first group non-capturing via ?:. I.e., (?:even|odd) as you're probably not interested in the class attribute :)
Try this:
preg_match_all('/(?:<tr class="row-even">|<tr class="row-odd">).<td align="center">(.*?)<\/td>/s', $html, $matches);
Changes made:
You've not accounted for the newline
between the tags
You don't need to x modifier as it
will discard the space in the regex.
Make the matching non-greedy by using
.*? in place of .*.
Working link
Actually, you dont need a too big change in your codebase. Fetching Text Nodes is always the same with DOM and XPath. All that does change is the XPath, so you could wrap the DOM code into a function that replaces your preg_match_all. That would be just a tiny change, e.g.
include_once "dom.php";
$matches = dom_match_all('//tr/td[1]', $html);
where dom.php just contains:
// dom.php
function dom_match_all($query, $html, array $matches = array()) {
$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->loadHTML($html);
libxml_clear_errors();
$xPath = new DOMXPath($dom);
foreach( $xPath->query($query) as $node ) {
$matches[] = $node->nodeValue;
}
return $matches;
}
and would return
Array
(
[0] => abcde
[1] => efgh
[2] => ijkl
)
But if you want a Regex, use a Regex. I am just giving ideas.
This is what I came up with
<td align="center">([^<]+)</td>
I'll explain. One of the challenges here is what's between the tags could be either the text you're looking for, or an tag. In the regex the [^<]+ says to match one or more characters that is not the < character. That's great, because that means the won't match, and the the group will only match until the tag is found.
Disclaimer: Using regexps to parse HTML is dangerous.
To get the innerhtml of the first TD in each TR, use this regexp:
/<tr[^>]*>\s*<td[^>]>(.+?)<\/td>/si
This is just a quick and dirty regex to meet your needs. It could easily be cleaned up and optimized, but it's a start.
<tr[^>]+>[^\n]*\n #Match the opening <tr> tag
\s*<td[^>]+>([^<]+)[^\n]+\n #Group the wanted data
[^\n]+\n #Match next line
</tr> #Match closing tag
Here is an alternative way, which may be more robust:
deluserconfirm.html\?user=([^"]+)

explode glitch in delimiter

I'm having some trouble with delimiter for explode. I have a rather chunky string as a delimiter, and it seems it breaks down when I add another letter (start of a word), but it doesn't get fixed when I remove first letter, which would indicate it isn't about lenght.
To wit, the (working) code is:
$boom = htmlspecialchars("<td width=25 align=\"center\" ");
$arr[1] = explode($boom, $arr[1]);
The full string I'd like to use is <td width=25 align=\"center\" class=\", and when I start adding in class, explode breaks down, and nothing gets done. That happens as soon as I add c, and it doesn't go away if I remove <, which it would if it's just a matter of string lenght.
Basically, the problem isn't dire, since I can just replace class=" with "" after the explode, and get the same result, but this has given me headaches to diagnose, and it seems like a really wierd problem. For what it's worth, I'm using PHP 5.3.0 in XAMPP 1.7.2.
Thanks in advance!
You could try converting every occurrence of the delimiter in the original string
"<td width=25 align=\"center\" "
in something more manageable like:
"banana"
and then explode on that word
Have you tried adding htmlspecialchars to the explode.
$arr[1] = explode($boom, htmlspecialchars($arr[1]));
I get unexpected results without it, but with it it works perfectly.
$s = '<td width=25 align="center" class="asdjasd">sdadasd</td><td width=25 align="center" >asdasD</td>';
$boom = htmlspecialchars("<td width=25 align=\"center\" class=");
$sex = explode($boom, $s);
print_r($sex);
Outputs:
Array
(
[0] => <td width=25 align="center" class="asdjasd">sdadasd</td><td width=25 align="center" >asdasD</td>
)
Whereas
$s = '<td width=25 align="center" class="asdjasd">sdadasd</td><td width=25 align="center" >asdasD</td>';
$boom = htmlspecialchars("<td width=25 align=\"center\" class=");
$sex = explode($boom, htmlspecialchars($s));
print_r($sex);
Outputs
Array
(
[0] =>
[1] => "asdjasd">sdadasd</td><td width=25 align="center" >asdasD</td>
)
This is because $boom is htmlspecialchar encoded, < and > get transformed into < and >, which it cannot find the in the string, so it just returns the whole string.

Categories