This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 6 years ago.
How may I preg_match this in PHP?
$item[0] = <tr><td rowspan=2>07/07/2016 14:55</td><td>AC MENDES - Mendes/RJ</td><td><font color="000000">Postado depois do horário limite da agência</font></td></tr>
I've tried the code below, but it didn't work...
if(preg_match("#<td rowspan=[2]>(.*)</td><td>(.*)</td><td><FONT COLOR=\"[0-9A-F]{6}\">(.*)</font></td>#", $item[0], $d))
{
echo 'OK';
}
You may use a DOM with XPath to get all TD texts:
$html = <<<DATA
<tr><td rowspan=2>07/07/2016 14:55</td><td>AC MENDES - Mendes/RJ</td><td><font color="000000">Postado depois do horário limite da agência</font></td></tr>;
DATA;
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$tds = $xpath->query('//td');
$res = array();
foreach($tds as $td) {
array_push($res, $td->nodeValue);
}
print_r($res);
See the PHP demo
The //td will get all td nodes. You might also be using '//text()' XPath to just grab all text nodes.
Else, if you know what you are doing, you may add some temporary strings after each <td> node and then strip tags and explode right with the temporary string:
explode("###", strip_tags(str_replace("<td>", "<td>###", $s)))
See this demo
Related
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 7 years ago.
i need to get href value from class name like that.
some one can give me regex to find this class name and get from there the href ?
thanks alot
You can do this using DomDocument and XPath, see here
$str = '';
$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXPath($d);
$links = $xpath->query('//a[#class="boldOrange"]');
foreach ($links as $link) {
$url = $link->getAttribute('href');
print $url . PHP_EOL;
}
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
i have following php regex code.. i want to extract the stock symbol in some html output.
The stock symbol i want to extract is /q?s=XXXX -- XXXX (the stock symbol) could be 1 to 5 characters long.
if(preg_match_all('~(?<=q\?s=)[-A-Z.]{1,5}~', $html, $out))
{
$out[0] = array_unique($out[0]);
} else {
echo "FAIL";
}
HTML code below (case 1 and case that i applied this to)
case #1 (does *not* work)
Bellicum Pharmaceuticals, Inc.
case #2 (does work correctly)
NYLD
Looking for suggestions on how i can update my php regex code to make it work for both case 1 and case 2. Thanks.
Instead of using regex, make effective use of DOM and XPath to do this for you.
$doc = new DOMDocument;
#$doc->loadHTML($html); // load the HTML data
$xpath = new DOMXPath($doc);
$links = $xpath->query('//a[substring(#href, 1, 5) = "/q?s="]');
foreach ($links as $link) {
$results[] = str_replace('/q?s=', '', $link->getAttribute('href'));
}
print_r($results);
eval.in
The answer seems nice, but it seems like a lot of work and code to maintain, no?
if (preg_match_all('/q\?s=(\S{1,5})\"/', $html, $match)) {
$symbols = array_unique($match[1]);
}
or even shorter... '/q\?s=(\S+)\"/'
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to parse and process HTML/XML with PHP?
Grabbing the href attribute of an A element
how can you get all that is between the tags: <td class="detail"></td> ? I use PHP.
Why don't you use DOMXpath?
$xpath = new DOMXPath(yourHtml);
$nodeList = $xpath->query( "//td[#class='detail']" );
foreach($nodeList as $node)
{
echo $node->nodeValue;
}
Parsing HTML with regex isn't a good choice
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Best methods to parse HTML with PHP
I understand I should be using a html parser like php domdocument (http://docs.php.net/manual/en/domdocument.loadhtml.php) or tagsoup.
How would I use php domdocument to extract text between specific tags, for example get text between h1,h2,h3,p,table? It seems I can only do this for one tag only with getelementbytagname.
Is there a better html parser for such task? Or how would I loop through the php domdocument?
You are correct, use DomDocument (since regex is NOT a good idea for parsing HTML. Why? See here and here for reasons why).
getElementsByTagName gives you a DOMNodeList that you can iterate over to get the text of all the found elements. So, your code could look something like:
$document = new \DOMDocument();
$document->loadHTML($html);
$tags = array ('h1', 'h2', 'h3', 'h4', 'p');
$texts = array ();
foreach($tags as $tag)
{
$elementList = $document->getElementsByTagName($tag);
foreach($elementList as $element)
{
$texts[$element->tagName][] = $element->textContent;
}
}
return $texts;
Note that you should probably have some error handling in there, and you will also lose the context of the texts, but you can probably edit this code as you see fit.
You can doing so with a regex.
preg_match_all('#<h1>([^<]*)</h1>#Usi', $html_string, $matches);
foreach ($matches as $match)
{
// do something with $match
}
I am not sure what is your source so I added a function to get the content via the URL.
$file = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($file);
$body = $doc->getElementsByTagName('body');
$h1 = $body->getElementsByTagName('h1');
I am not sure of this part:
for ($i = 0; $i < $items->length; $i++) {
echo $items->item($i)->nodeValue . "\n";
}
Or:
foreach ($items as $item) {
echo $item->nodeValue . "\n";
}
Here is more info on nodeValue: http://docs.php.net/manual/en/function.domnode-node-value.php
Hope it helps!
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Grabbing the href attribute of an A element
hi, i have this string in PHP
<iframe frameborder="0" width="320" height="179" src="http://www.dailymotion.com/embed/video/xinpy5?width=320&wmode=transparent"></iframe><br />Le buzz Pippa Middleton agace la Reine ! <i>par direct8</i>
i would like to extract the url from the anchor href attribute using preg_match or other php functins
Don't use regexes to parse HTML. Use the PHP DOM:
$DOM = new DOMDocument;
$DOM->loadHTML($str); // Your string
//get all anchors
$anchors = $DOM->getElementsByTagName('a');
//display all hrefs
for ($i = 0; $i < $anchors->length; $i++)
echo $anchors->item($i)->getAttribute('href') . "<br />";
You can check if the node has a href using hasAttribute() first if necessary.
You can use
if (preg_match('#<a\s*[^>]*href="([^"]+)"#i', $string, $matches))
echo $matches[0];
try this regex
(?<=href=\")[\w://\.\-]+