php - Query a table with DOMXPath - php

I try to access the values ​​of a table on a web page with a php expression DOMXPath::query. When I navigate with my web browser in this page I can see this table but when I execute my query this table isn't visible and don't seem accessible.
This table have an id, but when I specify it on my query an other one is returned. I want to read the table with the id 'totals', but I only have that one with the id 'per_game'. When I inspect page's code, a lot of elements seem to be in comments.
Here is my script:
<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile('https://www.basketball-reference.com/players/j/jokicni01.html');
$xpath = new DOMXPath($doc);
$table = $xpath->query("//div[#id='totals']")->item(0);
$elem = $doc->saveXML($table);
echo $elem;
?>
How can i read elements in the table with the id 'totals' ?
The full path is /html/body/div[#id="wrap"]/div[#id="content"]/div[#id="all_totals"]/div[#class="table_outer_container"]/div[#id="div_totals"]/table[#id="totals"]

You can cut your query in two parts : first, retrieve the comment in the correct div, then create a new document with this content to retrieve the element you want :
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
#$doc->loadHTMLFile('https://www.basketball-reference.com/players/j/jokicni01.html');
$xpath = new DOMXPath($doc);
// retrieve the comment section in 'all_totals' div
$all_totals_element = $xpath->query('/html/body/div[#id="wrap"]/div[#id="content"]/div[#id="all_totals"]/comment()')->item(0);
$all_totals_table = $doc->saveXML($all_totals_element);
// strip comment tags to keep the content inside
$all_totals_table = substr($all_totals_table, strpos($all_totals_table, '<!--') + strlen('<!--'));
$all_totals_table = substr($all_totals_table, 0, strpos($all_totals_table, '-->'));
// create a new Document with the content of the comment
$tableDoc = new DOMDocument ;
$tableDoc->loadHTML($all_totals_table);
$xpath = new DOMXPath($tableDoc);
// second part of the query
$totals = $xpath->query('/div[#class="table_outer_container"]/div[#id="div_totals"]/table[#id="totals"]')->item(0);
echo $tableDoc->saveXML($totals) ;

Related

Get image URL inside an element from external website - Laravel

As the problem I've mentioned here. I'm going to try alternative way of getting an image url. I want to get the product image url from https://www.matchesfashion.com/products/Adidas-By-Stella-McCartney-Metallic-zebra-print-Primegreen-leggings-1424516 and if you inspect the product image it can be access inside a <figure></figure> element. I did some reseach and wrote this code to get content from an external webpage. But it didn't return anything.
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile('https://www.matchesfashion.com/products/Adidas-By-Stella-McCartney-Metallic-zebra-print-Primegreen-leggings-1424516');
$xpath = new DOMXPath($doc);
$var = $xpath->evaluate('string(//figure[#class="iiz"])');
I just need to get the source url of that image So I can continue my Image encoding process. Thanks in advance
Hi There you can use bellow code to grab the image urls
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
ini_set('user_agent', 'My-Application/2.5');
libxml_use_internal_errors(true);
$doc->loadHTMLFile('https://www.matchesfashion.com/products/Adidas-By-Stella-McCartney-Metallic-zebra-print-Primegreen-leggings-1424516');
$xpath = new DOMXPath($doc);
$imgs = $xpath->query('//*[#class="iiz__img "]');
foreach($imgs as $img)
{
echo 'ImgSrc: https:' . $img->getAttribute('src') .'<br />' . PHP_EOL;
}
Here is your desired results
ImgSrc: https://assetsprx.matchesfashion.com/img/product/920/1424516_1.jpg
ImgSrc: https://assetsprx.matchesfashion.com/img/product/920/1424516_1.jpg

Curl get a description of a website but not from meta tags

I am trying to get a description of a site but without the use of the meta tags. Basically what I am trying to get is the first couple of sentences of a site.
So far I got this but I do not know how to get the content out the div:
$checkLinkOnPage = '{sitehere}';
$html = file_get_contents($checkLinkOnPage);
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
// find the element whose href value you want by XPath
$nodes = $xpath->query('//*');
$approvedLinks = array();
foreach($nodes as $href) {
//Check all links see if they are valid.
$url = $href->getAttribute('html');
if($href->tagName == 'div'){
//Display first div content here
}
}

How to extrapolate the text content within an iframe utilizing DOM and XPath?

I would like to use DOMXPath to extrapolate the text of a message within an iframe object.
This is what I've tried so far:
$dom = new DOMDocument();
// We don't want to bother with white spaces
$dom->preserveWhiteSpace = false;
// Most HTML Developers are chimps and produce invalid markup...
$dom->strictErrorChecking = false;
$dom->recover = true;
$isSuccessful = #$dom->loadHTMLFile('https://t.me/Chelsea/874');
// Delay 1 second after the request to avoid getting BANNED
sleep(1);
// Check to see if URL is valid
if($isSuccessful === false)
{
//URL invalid!
echo "\"".$url."\" is invalid<br>";
return false;
}
$xpath = new DOMXPath($dom);
$hrefs=$xpath->query('//*[#id="widget"]/iframe');
However, the length of the array is 0.
How can I extrapolate the text content within the class tgme_widget_message_text of the iframe?

Generate two XML files using PHP

I currently use the following PHP code to generate an XML file from form inputs on a HTML page (I'm using $_POST):
$doc = new DOMDocument('1.0');
$doc->formatOutput = true;
$doc->preserveWhiteSpace = true;
$doc->loadXML($xml->asXML(), LIBXML_NOBLANKS);
$doc->save('../application/'.$filefname.$filesname.'_'.date("Y-m-d").'.xml');
However I would like to generate two XML files, each with different information. Do I need to do something like have two variables? eg. $xml1 and $xml2, $doc1 and $doc2 like so?
$doc1 = new DOMDocument('1.0');
$doc1->formatOutput = true;
$doc1->preserveWhiteSpace = true;
$doc1->loadXML($xml1->asXML(), LIBXML_NOBLANKS);
$doc1->save('../application/'.$filefname.$filesname.'_'.date("Y-m-d").'.xml');
$doc2 = new DOMDocument('1.0');
$doc2->formatOutput = true;
$doc2->preserveWhiteSpace = true;
$doc2->loadXML($xml2->asXML(), LIBXML_NOBLANKS);
$doc2->save('../application/'.$filefname.$filesname.'_'.date("Y-m-d").'.xml');
Eg. Two email addresses and two names are entered into the online form, I want each person's details in a seperate file.
Customer 1's name and email in cust1.xml and Customer 2's name and email in cust2.xml
You can reuse the $doc variable. As soon as you assign it a new DOMDocument the $doc variable points to the new instance.
// $doc points to instance #1 of DOMDocument
$doc = new DOMDocument('1.0');
...
// $doc points to instance #2 of DOMDocument
$doc = new DOMDocument('1.0');
The same applies to all reference types like objects.
Instead of duplicating your code you should create a function
function createDocument($xml) {
$doc = new DOMDocument('1.0');
$doc->formatOutput = true;
$doc->preserveWhiteSpace = true;
$doc->loadXML($xml->asXML(), LIBXML_NOBLANKS);
$doc->save('../application/'.$filefname.$filesname.'_'.date("Y-m-d").'.xml');
}
You should always avoid code duplication. See DRY principle.

Updating existing element in XML with PHP

Currently, im having this for appending data to my items file:
$xmldoc = new DOMDocument();
$xmldoc->load('ex.xml');
$item= $xmldoc->createElement('item');
$item->setAttribute('id', '100');
$item->setAttribute('category', 'Fitness');
$item->setAttribute('name', 'Basketball');
$item->setAttribute('url', 'http://google.com');
$item->setAttribute('description', 'This is a description');
$item->setAttribute('price', '899');
$xmldoc->getElementsByTagName('items')->item(0)->appendChild($item);
$xmldoc->save('ex.xml');
Now before appending this, I would like to check for an existing element "item" that has the same attribute id value.
And if it does it should update that element with these new data.
Currently it just appends and doesnt check anything.
$xmldoc = new DOMDocument();
$xmldoc->load('ex.xml');
$xpath = new DOMXPath($xmldoc);
$query = $xpath->query('/mainXML/items/item[#id = "100"]');
$create_new_node = false;
if($query->length == 0)
{
$item = $xmldoc->createElement('item');
$create_new_node = true;
}
else
{
$item = $query->item(0);
}
$item->setAttribute('id', '100');
$item->setAttribute('category', 'Fitness');
$item->setAttribute('name', 'Basketball');
$item->setAttribute('url', 'http://google.com');
$item->setAttribute('description', 'This is a description');
$item->setAttribute('price', '899');
if($create_new_node)
{
$xmldoc->getElementsByTagName('items')->item(0)->appendChild($item);
}
$xmldoc->save('ex.xml');
I haven't used this functionality but looks like a good match for DOMDocument: Get Element By ID
If you get a matching element, edit it, and if not, post away.
If you have a DTD for this xml file that specifies that the "id" attribute is an ID type (i.e. its value is unique in a document and uniquely identifies its element), then you can use DOMDocument::getElementById().
Most likely, however, you do not have a DTD. In this case, you should just use XPath:
$xmldoc = new DOMDocument();
$xmldoc->load('ex.xml');
$xpath = new DOMXPath($xmldoc);
$results = $xpath->query('//items/item[#id=100][0]');
if (!$results->length) {
$item= $xmldoc->createElement('item');
$item->setAttribute('id', '100');
$item->setAttribute('category', 'Fitness');
$item->setAttribute('name', 'Basketball');
$item->setAttribute('url', 'http://google.com');
$item->setAttribute('description', 'This is a description');
$item->setAttribute('price', '899');
$xmldoc->getElementsByTagName('items')->item(0)->appendChild($item);
$xmldoc->save('ex.xml');
}
You should also consider using SimpleXML for this task. The way this xml is structured and manipulated would probably be better-suited to SimpleXML.

Categories