How to return in php DOMXPath object? - php

Now found query if '$NotXP->query' = query return string?!
How to make work next code?
$xp = new \DOMXPath(#\DOMDocument::loadHTMLFile($url));
$list = $xp->query('//table[#class="table-list quality series"] tbody');
$link = $list->query('//tr[#class="item"]');
$arr_links = [];
foreach ($link as $link_in_cycle) {
$link_quality = $link_in_cycle->query('//td[#class="column first video"]');
$link_audio = $link_in_cycle->query('//td[#class="column audio"]');
$link_size = $link_in_cycle->query('//td[#class="column size"]');
$link_seed = $link_in_cycle->query('//td[#class="column seed-leech"] span[#class="seed"]');
$link_download_url = $link_in_cycle->query('//td[#class="column last download"] a')->getAttribute("data-default");
html source for request #nigel-ren
From this code need grab of info
<tbody>
<tr class="item">
<td class="column first video">720x400</td>
<td class="column audio">mp3</td>
<td class="column size">5.70 Gb</td>
<td class="column seed-leech">
<span class="seed">15</span>
<span class="leech">26</span>
</td>
<td class="column updated">07.07.2017</td>
<td class="column consistence"></td>
<td class="column last download">
<a class="button middle rounded download zona-link"
data-type="download"
data-zona="0"
data-torrent=""
data-default="url_data"
data-not-installed=""
data-installed=""
data-metriks="{'eventType': 'click', 'data' : { 'type': 'show_download', 'id': '84358'}}"
title="text in title" href="javascript:void(0);" >Download</a> </td>

I've made a few changes to help me in debug the code. The main thing is that your XPath expressions were invalid, you can always try a site like FreeFormatter which allows you to check your expressions with some example source.
$doc = new \DOMDocument();
$doc->loadHTMLFile($url);
$xp = new \DOMXPath($doc);
$list = $xp->query('//table[#class="table-list quality series"]//tr[#class="item"]');
$arr_links = [];
foreach ($list as $link_in_cycle) {
$link_quality = $xp->query('//td[#class="column first video"]/text()', $link_in_cycle)[0]->wholeText;
$link_audio = $xp->query('//td[#class="column audio"]/text()', $link_in_cycle)[0]->wholeText;
$link_size = $xp->query('//td[#class="column size"]/text()', $link_in_cycle)[0]->wholeText;
$link_seed = $xp->query('//td[#class="column seed-leech"]//span[#class="seed"]/text()', $link_in_cycle)[0]->wholeText;
$link_download_url = $xp->query('//td[#class="column last download"]//a/#data-default', $link_in_cycle)[0]->value;
echo $link_quality.PHP_EOL;
echo $link_audio.PHP_EOL;
echo $link_size.PHP_EOL;
echo $link_seed.PHP_EOL;
echo $link_download_url.PHP_EOL;
}
The XPath expressions try and retrieve the text node in each element, which will return a list of all of the nodes, this code does assume there isn't any whitespace around the actual content (and uses [0] to fetch the first element of the list). The wholetext is just the actual content of the DOMText element.
With the sample content you gave (plus the surrounding bits I had to invent) it gives...
720x400
mp3
5.70 Gb
15
Download

Related

firstChild in DOMDocument not working

Here is the code snippet from which I have to fetch the firstChild from the DIV named u-Row-6...
<div class="u-Row-6">
<div class='article_details_price2'>
<strong >
855,90 € *
</strong>
<div class="PseudoPrice">
<em>EVP: 999,00 € *</em>
<span>
(14.32 % <span class="frontend_detail_data">gespart</span>)
</span>
</div>
</div>
</div>
For this I have used the following code:
foreach($dom->getElementsByTagName('div') as $p) {
if ($p->getAttribute('class') == 'u-Row-6') {
if ($first) {
$name = $p->firstChild-nodeValue;
$name = str_replace('€', '', $name);
$name = str_replace(chr(194), " ", $name);
$first = false;
}
}
}
But mysteriously this code is not working for me
There is a number of problems with your code:
$first is not initialized to a true value, which will prevent the string replacement code from running even once
The $p->firstChild-nodeValue lacks an > before nodeValue
$p->firstChild will actually resolve to a text node (any text between <div class="u-Row-6"> and <div class='article_details_price2'> - currently nothing), not the strong you are looking for and not <div class='article_details_price2'> either, as one might have expected.
You may want to use an XPath query instead, to get all the strong tags within a div of class "u-Row-6", and then loop through the found tags:
$src = <<<EOS
<div class="u-Row-6">
<div class='article_details_price2'>
<strong >
855,90 € *
</strong>
<div class="PseudoPrice">
<em>EVP: 999,00 € *</em>
<span>
(14.32 % <span class="frontend_detail_data">gespart</span>)
</span>
</div>
</div>
</div>
EOS;
$dom = new DOMDocument();
$dom->loadHTML($src);
$xpath = new DOMXPath($dom);
$strongTags = $xpath->query('//div[#class="u-Row-6"]//strong');
foreach ($strongTags as $tag) {
echo "The strong tag contents: " . $tag->nodeValue, PHP_EOL;
// Replacement code goes here ...
}
Output:
The strong tag contents:
855,90 € *
XPaths are actually quite handy. Read more about them here.

Fetching Image from particular div Only via DOMDocument in PHP

I have website, where i have posted few images inside particular div :-
<div class="posts">
<div class="separator">
<img src="http://www.example.com/image.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
<div class="separator">
<img src="http://www.example.com/imagesda.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
.... few more images
</div>
And from my 2nd website, i want to fetch all images on that particular div.. I have below code.
<?php
$htmlget = new DOMDocument();
#$htmlget->loadHtmlFile('http://www.example.com');
$xpath = new DOMXPath( $htmlget);
$nodelist = $xpath->query( "//img/#src" );
foreach ($nodelist as $images){
$value = $images->nodeValue;
echo "<img src='".$value."' /><br />";
}
?>
But this is fetching all images from my website and not just particular div. It also prints out my RSS image, Social icon image, etc.,
Can i specify particular div in my php code, so that it only fetch image from div.posts class.
first give a "id" for the outer div container. Then get it by its id. Then get its child image nodes.
an example:
$tables = $dom->getElementsById('node_id');
$table = $tables->item(1);
//get the number of rows in the 2nd table
echo $table->childNodes->length;
//content of each child
foreach($table->childNodes as $child)
{
echo $child->ownerDocument->saveHTML($child);
}
may be this like will help you. It has a good tutorial.
http://www.binarytides.com/php-tutorial-parsing-html-with-domdocument/
With PHP Simple HTML Parser, this will be:
include('simple_html_dom.php');
$html=file_get_html("http://your_web_site.com");
foreach($html->find('div.posts img') as $img_posts){
echo $img_posts->src.<br>; // to show the source attribute
}
Still reading about PHP Simple HTML Dom parser. And so far, it's faster(in implementation) than regex.
Here is another code that may help. You are looking for
doc->getElementsByTagName
which can help target a tag directly.
<?php
$myhtml = <<<EOF
<html>
<body>
<div class="posts">
<div class="separator">
<img src="http://www.example.com/image.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
<div class="separator">
<img src="http://www.example.com/imagesda.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
.... few more images
</div>
</body>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($myhtml);
$divs = $doc->getElementsByTagName('img');
foreach ($divs as $div) {
foreach ($div->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo "Attribute '$name' :: '$value'<br />";
}
}
?>
Demo here http://codepad.org/keZkC377
Also the answer here can provide further insights
Not finding elements using getElementsByTagName() using DomDocument

Inserting a Div through DOMDocument into content got from remote site

I am sorry if the title is confusing here is the explanation:
say we have remote page like remotesite.com/page1.html and we use the function file_get_contents to get its source, then we use DOMDocument to edit this source before printing it to our page
$url = "remotesite.com/page1.html";
$html = file_get_contents($url);
$doc = new DOMDocument(); // create DOMDocument
libxml_use_internal_errors(true);
$doc->loadHTML($html); // load HTML you can add $html
//here we do some edits to remove or add contents
I want to add the Div below to the content before printing it:
<div style="float: right; padding-right: 2px;"><a class="open_event_tab" target="_blank" href="some-hard-coded-text-here_'+content+'_title_'+lshtitle+'_event_'+id+'.html" >open event</a></div>
Here is what i have tried but the soft coded part ( '+content+'_title_'+lshtitle+'_event_'+id+') of the href is not working
i know that the my code below may look stupid but sorry i dont have enough knowledge in php
function createDivNode($doc) {
$divNode = $doc->createElement('div');
$divNode->setAttribute('style', 'float: right; padding-right: 2px;');
$aNode = $doc->createElement('a', 'openEvent');
$aNode->setAttribute('class', 'open_event_tab');
$aNode->setAttribute('target', '_blank');
$aNode->setAttribute('href', 'some-hard-coded-text-here_'+content+'_title_'+lshtitle+'_event_'+id+'.html');
$divNode->appendChild($aNode);
return $divNode;
}
and i want to loop through the source got from remote site to get every td that look like the one below and add the div just before closing it
<td colspan="2">
<b>Video </b>
<span class="section">Sports</span><b>: </b>
<span id="category466" class="category">Motor Sports</span>
//here i want to add my div
</td>
6 hours of research and i can't figure this out as i am in learning phase, so i decided to ask someone here at this helpful community
Still now confused about your question but I think that, you want to add/append a dynamic div inside every td and if this is the case then you may try this (at least, you'll get an idea and it's very clean)
var content = 'someContent', lshtitle = 'someTitle', id = 'anId';
var attr = {
'class' : 'open_event_tab',
'target' : '_blank',
'href' : 'some-hard-coded-text-here_' + content + '_title_' + lshtitle + '_event_' + id + '.html',
'text' : 'open event'
};
var link = $('<a/>', attr);
var div = $('<div/>', { 'style' : 'float:right;padding-right:2px;' }).append(link);
$('#myTable td').append(div);
DEMO.
Update : (Question was confusing, so updated answer given below)
Just download Simple HTML DOM Perser and (documentation here)
include('simple_html_dom.php');
$html = file_get_html('remotesite.com/page1.html');
foreach($html->find('table td') as $td) {
$td->innertext = $td->innertext . '<div>New Div</div>';
}
Also, modify only tds those have class=category
foreach($html->find('table td.category') as $td) {
$td->innertext = $td->innertext . '<div>New Div</div>';
}
And you are done. Notice <div>New Div</div>, it's just an example, hope, you can make it according to your need.
Possible result of the example :
<table>
<tbody>
<tr>
<td colspan="2">
<b>Video </b> <span class="section">Sports</span><b>: </b> <span id="category466" class="category">Motor Sports</span>
<div>New Div</div>
</td>
</tr>
<tr>
<td colspan="2">
<b>Video </b> <span class="section">Sports</span><b>: </b> <span id="category466" class="category">Motor Sports</span>
<div>New Div</div>
</td>
</tr>
</tbody>
</table>
JQUERY
if you want to insert the div at the end of each td use append()
$('td').append('<div>Test</div>');
At the beginning of each td use prepend
$('td').prepend('<div>Test</div>');
For more information jquery website

How do I parse HTML using PHP DOMDocument?

I have an HTML block here:
<div class="title">
<a href="http://test.com/asus_rt-n53/p195257/">
Asus RT-N53
</a>
</div>
<table>
<tbody>
<tr>
<td class="price-status">
<div class="status">
<span class="available">Yes</span>
</div>
<div name="price" class="price">
<div class="uah">758<span> ua.</span></div>
<div class="usd">$ 62</div>
</div>
How do I parse the link (http://test.com/asus_rt-n53/p195257/), title (Asus RT-N53) and price (758)?
Curl code here:
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML($content);
$xpath = new DOMXPath($dom);
$models = $xpath->query('//div[#class="title"]/a');
foreach ($models as $model) {
echo $model->nodeValue;
$prices = $xpath->query('//div[#class="uah"]');
foreach ($prices as $price) {
echo $price->nodeValue;
}
}
One ugly solution is to cast the price result to keep only numbers:
echo (int) $price->nodeValue;
Or, you can query to find the span inside the div, and remove it from the price (inside the prices foreach):
$span = $xpath->query('//div[#class="uah"]/span')->item(0);
$price->removeChild($span);
echo $price->nodeValue;
Edit:
To retrieve the link, simply use getAttribute() and get the href one:
$model->getAttribute('href')

Parse span class text with DOM PHP

I've been having an issue trying to parse text in a span class with DOM. Here is my code example.
$remote = "http://website.com/";
$doc = new DOMDocument();
#$doc->loadHTMLFile($remote);
$xpath = new DOMXpath($doc);
$node = $xpath->query('//span[#class="user"]');
echo $node;
and this returns the following error -> "Catchable fatal error: Object of class DOMNodeList could not be converted to string". I am so lost I NEED HELP!!!
What I am trying to do is parse the user name between this span tag.
<span class="user">bballgod093</span>
Here is the full source from the remote website.
<div id="randomwinner">
<div id="rndmLeftCont">
<h2 id="rndmTitle">Hourly Random <span>Winner</span></h2>
</div>
<div id="rndmRightCont">
<div id="rndmClaimImg">
<table cellspacing="0" cellpadding="0" width="200">
<tbody>
<tr>
<td align="right" valign="middle">
</td>
</tr>
</tbody>
</table>
</div>
<div id="rndmCaimTop">
<span class="user">bballgod093</span>You've won 1000 SB</div>
<div id="rndmCaimBottom">
<a id="rndmCaimBtn" class="btn1 btn2" href="/?cmd=cp-claim-random" rel="nofollow">Claim Bucks</a>
</div>
</div>
<div class="clear"></div>
</div>
This call
$node = $xpath->query('//span[#class="user"]');
does not return a string, but a DOMNodeList.
You can use this list somewhat like array (using $node->length for the number of elements and $node->item(0) to get the first element) to get DOMNode objects. Each of these objects has a nodeValue property which is a string.
So you would do something like
$node = $xpath->query('//span[#class="user"]');
if($node->length != 1) {
// error?
}
echo $node->item(0)->nodeValue;
Of course, changing the variable name for $node to something more appropriate would be nice.

Categories