How to extract value form HTML section in PHP - php

I need to extract data from HTML page which looks like:
<li>
<h2>
<span>rss</span>AC Ajaccio</h2>
<div class="club-left">
<img src="http://medias.lequipe.fr/logo-football/35/60?CCH-13-40" width="60" height="60">
</div>
<div class="club-right">
<ul class="club-links">
<li><span class="plus"></span>
Fiche club
</li>
<li><span class="plus"></span>
Calendrier
</li>
<li><span class="plus"></span>Effectif
</li>
<li><span class="plus"></span>
Stats joueurs
</li>
<li><span class="plus"></span>
Stats club
</li>
</ul>
</div>
<div class="clubt hidden">35</div>
<div class="clear"></div>
</li>
I would like to extract in PHP the href value and the text of this part:
**Stats joueurs**
I use the following code, but there is something missing:
$elements = $xpath->query("//div[#id='Base']/ul/li");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
if($node->nodeName!='#text'){
echo $node->nodeValue.";<br/>";
$stringData = trim($node->nodeValue).";";
}
}
}

UPDATE:
Try:
$elements = $xpath->query("//ul[#class='club-links']//a");
foreach ($elements as $element) {
echo $element->nodeValue." - ".$element->getAttribute("href")."<br/>";
}

Related

PHP: How to turn from list UL / LI to text

I am trying to turn from UL / LI in this address list to text.
From This:
example
To this:
06.09.19 23:30 אור הנר 06.09.19 23:30 ~ שדרות, איבים
<h5 id="ctl00_ContentPlaceHolder1_ucAlarmsGrid_alertsTitle">רשימת ההתרעות</h5>
<ul id="ctl00_ContentPlaceHolder1_ucAlarmsGrid_ulAlertsList" class="more_result">
<li id="ctl00_ContentPlaceHolder1_ucAlarmsGrid_rpt_ctl00_liArea" areaCode="0" time="">
<span><strong>06.09.19</strong></span>
<span class="border"><strong>23:30</strong></span>
<span class="span_area">אור הנר</span>
</li>
<li id="ctl00_ContentPlaceHolder1_ucAlarmsGrid_rpt_ctl01_liArea" areaCode="0" time="">
<span><strong>06.09.19</strong></span>
<span class="border"><strong>23:30</strong></span>
<span class="span_area">שדרות, איבים</span>
</li>
</ul>
I try to do this in php..
thanks!!
<?php
$html = '<h5 id="ctl00_ContentPlaceHolder1_ucAlarmsGrid_alertsTitle">רשימת ההתרעות</h5>
<ul id="ctl00_ContentPlaceHolder1_ucAlarmsGrid_ulAlertsList" class="more_result">
<li id="ctl00_ContentPlaceHolder1_ucAlarmsGrid_rpt_ctl00_liArea" areaCode="0" time="">
<span><strong>06.09.19</strong></span>
<span class="border"><strong>23:30</strong></span>
<span class="span_area">אור הנר</span>
</li>
<li id="ctl00_ContentPlaceHolder1_ucAlarmsGrid_rpt_ctl01_liArea" areaCode="0" time="">
<span><strong>06.09.19</strong></span>
<span class="border"><strong>23:30</strong></span>
<span class="span_area">שדרות, איבים</span>
</li>
</ul>';
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach( $dom->getElementsByTagName('li') as $node)
{
echo $node->textContent . PHP_EOL;
}

Unable pull out the node value of src using getattribute

I am trying to echo out the href and the image src using getattribute but though the href gets echoed correctly I am unable to retrieve the image src...plz guide. below is my
html mockup
<div id="hot-deals">
<div class="all-deals">
<ul>
<li><a href="http://url1.com">
<img src="http://imagelink1.com"></a>
</li>
<li><a href="http://url2.com">
<img src="http://imagelink2.com"></a>
</li>
<li><a href="http://url3.com">
<img src="http://imagelink3.com"></a>
</li>
</ul>
</div>
</div>
my code
$nodes = $my_xpath->query( '//div[#id="hot-deals"]/div[#class="all-deals"]/ul/li/a' );
foreach( $nodes as $node )
{
$title=$node->getAttribute('href');
$img=$node->getAttribute('img/src');
echo $title.",".$img."<br>";
}
src is not attribute of a tag, so you need one more step to get inner img tag and then take its attribute
foreach( $nodes as $node ) {
$title = $node->getAttribute('href');
$imgTags = $node->getElementsByTagName('img');
$img = $imgTags->item(0)->getAttribute('src');
echo $title . "," . $img . "<br>";
}
You can try this code.
<?php
$str = '<div id="hot-deals">
<div class="all-deals">
<ul>
<li><a href="http://url1.com">
<img src="http://imagelink1.com"></a>
</li>
<li><a href="http://url2.com">
<img src="http://imagelink2.com"></a>
</li>
<li><a href="http://url3.com">
<img src="http://imagelink3.com"></a>
</li>
</ul>
</div>
</div>';
$nodes = simplexml_import_dom(DOMDocument::loadHTML($str))->xpath('//div[#id="hot-deals"]/div[#class="all-deals"]/ul/li/a');
foreach( $nodes as $node )
{
$title = $node['href'];
$src = $node->img['src'];
echo $title ." " . $src . '<br>';
}

PHP DOMXPath query to return value

I want to get and show the title, tel, fax and address if they exist.
HTML code could be like :
<div id="id1">
<div class="AB">
<ul>
<li class="title"> AGENCE X </li>
<li class="tel"> 060000000</li>
<li class="fax"> 06000000</li>
<li class="address> this is the address </li>
</ul>
</div>
<div class="AB"> //the same class name
<ul>
<li class="titre"> AGENCE X </li>
<li class="tel"> 060000000</li>
<li class="fax"> 06000000</li>
</ul>
</div>
<div>...</div>
</div>
I wrote this code but I didn't know how to write the condition "if a node with class name 'address' or 'fax' or 'tel' exist' then do X.
Here is my code:
$doc = new DOMDocument();
#$doc->loadHTMLFile('http://website.com');
$node = $doc->getElementsByTagName('div') ;
$xpath=new DOMXPath($doc);
$titre=$xpath->query('//div/ul/li[#class="titre"]');
$adresse=$xpath->query('//div/ul/li[#class="adresse"]');
$phone=$xpath->query('//div/ul/li[#class="phone"]');
$fax=$xpath->query('//div/ul/li[#class="fax"]');
$a=0;$b=0;$c=0;
for($i=0;$i<$titre->length;$i++)
{
echo $titre->item($i)->nodeValue.'<br/>' ;
if(a "li" has classe="adresse" existe)
{ for($a=0;$a<$adresse->length;$a++)
{
echo $adresse->item($a)->nodeValue.'<br/>' ;
$a++;
}
}
if(a "li" has classe="titre" existe){ for($b=0;$b<$phone->length;$b++)
{
echo $titre->item($b)->nodeValue.'<br/>' ;
$b++;
}
}
if(a "li" has classe="fax" existe) { for($c=0;$c<$fax->length;$c++)
{
echo $fax->item($c)->nodeValue.'<br/>' ;
$c++;
}
}
}
Could someone tell me how can I would write this condition or another solution?

Get DIV Element contents thru DOMDocument PHP

I have to recover some news from a div of a site. The div is structured as follows:
The HTML Markup:
<ul id="news-accordion" class="rounded" style="padding: 2px;">
<li class="o">
<h3>
<span>TITLE ARTICLE</span>
<span>30/10/2014</span>
</h3>
<div style="display: none;">
<p>text of article</p>
</div>
</li>
<li class="e">
<h3>
<span>TITLE ARTICLE</span>
<span>28/10/2014</span>
</h3>
<div style="display: none;">
<p>text of article</p>
</div>
</li>
<li class="o">
<h3>
<span>TITLE ARTICLE</span>
<span>29/10/2014</span>
</h3>
<div style="display: none;">
<p>text of article</p>
</div>
</li>
</ul>
PHP
<?php
$doc = new DomDocument;
$doc->validateOnParse = true;
$doc->loadHtml(file_get_contents('http://www.xxxxxxxxx/news.php'));
$news = $doc->getElementById('news-accordion');
$li = $news->getElementsByTagName('li');
foreach ($li as $row){
$title = $row->getElementsByTagName('h3');
echo $title->item(0)->nodeValue."<br><br>";
/*foreach ($title as $row2){
echo $row2->nodeValue."<br><br>";
//echo $row2->item(0)->nodeValue."<br><br>";
}*/
$text = $row->getElementsByTagName('p');
echo utf8_decode($text->item(0)->nodeValue)."<br><br><br>";
}
?>
The code works correctly, but when I print the contents of the span tag echo $title->item(0)->nodeValue;,
The text of the two span is printed together.
How can I take the contents of the two span separately? Thanks.
Yes you can, just adjust the ->item() index. Just like what you have done already in the other elements, point it to that header element, then just explicitly point it to those span children:
foreach ($li as $row){
$h3 = $row->getElementsByTagName('h3')->item(0);
$title = $h3->getElementsByTagName('span')->item(0); // first span
$date = $h3->getElementsByTagName('span')->item(1); // second span
echo $title->nodeValue . '<br/>';
echo $date->nodeValue . '<br/>';
$text = $row->getElementsByTagName('p');
echo utf8_decode($text->item(0)->nodeValue)."<br><br><br>";
}
$title = $row->getElementsByTagName('h3');
echo $title->item(0)->nodeValue."<br><br>";
Replace above two line with below (instead of using h3 tag use span tag)
$title = $row->getElementsByTagName('span');
echo $title->item(0)->nodeValue."<br><br>";
echo $title->item(1)->nodeValue."<br><br>";
It's working for me.

xPath insert before and after - With DOM and PHP

I need to add a class to a HTML structure.
My class is called "container" and should start right after <div><ul><li></h4> (the child of ul and its simblings, not grandchilds) and should end right before the closing of the same element.
My whole code looks like this:
<?php
$content = '
<div class="sidebar-1">
<ul>
<li>
<h4>Title</h4>
<ul>
<li>Test</li>
<li>Test</li>
</ul>
</li>
<li>
<p>Paragraf</p>
</li>
<li>
<h4>New title</h4>
<ul>
<li>Some text</li>
<li>Some text åäö</li>
</ul>
</li>
</ul>
</div>
';
$doc = new DOMDocument();
$doc->loadHTML($content);
$x = new DOMXPath($doc);
$start_text = '<div class="container">';
$end_text = '</div>';
foreach($x->query('//div/ul/li') as $anchor)
{
$anchor->insertBefore(new DOMText($start_text),$anchor->firstChild);
}
echo $doc->saveXML($doc->getElementsByTagName('ul')->item(0));
?>
It works as far as i can add the class opening but not the closing element. I also get strange encoding doing this. I want the output to be the same encoding as the input.
The result should be
<div class="sidebar-1">
<ul>
<li>
<h4>Title</h4>
<div class="content">
<ul>
<li>Test</li>
<li>Test</li>
</ul>
</div>
</li>
<li>
<div class="content">
<p>Paragraf</p>
</div>
</li>
<li>
<h4>New title</h4>
<div class="content">
<ul>
<li>Some text</li>
<li>Some text åäö</li>
</ul>
</div>
</li>
</ul>
</div>
I couldn't find a more elegant way to reassign all children, so I guess this will do. I think it gets what you're after, though.
(NOTE: Code updated to reflect additional requirements in the comments.)
$doc = new DOMDocument();
$doc->loadHTML($content);
$x = new DOMXPath($doc);
foreach($x->query('//div/ul/li') as $anchor)
{
$container = $doc->importNode(new DOMElement('div'));
$container->setAttribute('class', 'container');
$next = $anchor->firstChild;
while ($next !== NULL) {
$curr = $next;
$next = $curr->nextSibling;
if (($curr->nodeName != 'h4')
|| ($curr->attributes === NULL)
|| ($curr->attributes->getNamedItem('class') === NULL)
|| !preg_match('#(^| )title( |$)#', $curr->attributes->getNamedItem('class')->nodeValue)
) {
$container->appendChild($anchor->removeChild($curr));
}
}
$anchor->appendChild($container);
}
As for character encoding, I've been messing with it for a while and it's a tricky issue. The characters display correctly when you load with loadXML() but not with loadHTML(). There's a workaround in the comments, but it ain't pretty. Hopefully some of the user comments will help you can find a usable solution.

Categories