PHP DomCrawler fails - php

I'm grabing some information using PHP, 'DomDrawler' and 'Xpath Helper'
When I query some node information, there is no matched value returned.
I don't know why it doesn't work.
Page
<?php
require '../vendor/autoload.php';
use GuzzleHttp\Client;
use Symfony\Component\DomCrawler\Crawler;
function showcourse($response){
$data = []; //Store
$crawler = new Crawler();
$crawler->addHtmlContent($response);
try {
$data['name'] = $crawler->filterXPath('/html/body/div[#id=\'brief\']
/table/tbody/tr[1]/td[1]/table[#class=\'items\'][1]/tbody/tr/td[#class=
\'cover\'][1]/a[#id=\'NEU01000219238\']/img/#src')->text();
} catch (\Exception $e) {
echo "No nodes ";
}
print_r($data);
//echo $response;
}
?>
Result
Nothing is returned.

First, your shown xpath examples are different from each other ... without having a xml document its just wild guessing, but this kind of error is most likely an xpath expression issue. Try to reduce the complexity of the xpath expression until an expected result returns. Or try to begin with the simplest expression /html. If this also does not return anything try // as expression ... if there is also no result, the error is most probably not your expression.
Double check if a valid document returns from your response.

Related

PHP return value after XML exploration

I got a PHP array with a lot of XML users-file URL :
$tab_users[0]=john.xml
$tab_users[1]=chris.xml
$tab_users[n...]=phil.xml
For each user a <zoom> tag is filled or not, depending if user filled it up or not:
john.xml = <zoom>Some content here</zoom>
chris.xml = <zoom/>
phil.xml = <zoom/>
I'm trying to explore the users datas and display the first filled <zoom> tag, but randomized: each time you reload the page the <div id="zoom"> content is different.
$rand=rand(0,$n); // $n is the number of users
$datas_zoom=zoom($n,$rand);
My PHP function
function zoom($n,$rand) {
global $tab_users;
$datas_user=new SimpleXMLElement($tab_users[$rand],null,true);
$tag=$datas_user->xpath('/user');
//if zoom found
if($tag[0]->zoom !='') {
$txt_zoom=$tag[0]->zoom;
}
... some other taff here
// no "zoom" value found
if ($txt_zoom =='') {
echo 'RAND='.$rand.' XML='.$tab_users[$rand].'<br />';
$datas_zoom=zoom($r,$n,$rand); } // random zoom fct again and again till...
}
else {
echo 'ZOOM='.$txt_zoom.'<br />';
return $txt_zoom; // we got it!
}
}
echo '<br />Return='.$datas_zoom;
The prob is: when by chance the first XML explored contains a "zoom" information the function returns it, but if not nothing returns... An exemple of results when the first one is by chance the good one:
// for RAND=0, XML=john.xml
ZOOM=Anything here
Return=Some content here // we're lucky
Unlucky:
RAND=1 XML=chris.xml
RAND=2 XML=phil.xml
// the for RAND=0 and XML=john.xml
ZOOM=Anything here
// content founded but Return is empty
Return=
What's wrong?
I suggest importing the values into a database table, generating a single local file or something like that. So that you don't have to open and parse all the XML files for each request.
Reading multiple files is a lot slower then reading a single file. And using a database even the random logic can be moved to SQL.
You're are currently using SimpleXML, but fetching a single value from an XML document is actually easier with DOM. SimpleXMLElement::xpath() only supports Xpath expression that return a node list, but DOMXpath::evaluate() can return the scalar value directly:
$document = new DOMDocument();
$document->load($xmlFile);
$xpath = new DOMXpath($document);
$zoomValue = $xpath->evaluate('string(//zoom[1])');
//zoom[1] will fetch the first zoom element node in a node list. Casting the list into a string will return the text content of the first node or an empty string if the list was empty (no node found).
For the sake of this example assume that you generated an XML like this
<zooms>
<zoom user="u1">z1</zoom>
<zoom user="u2">z2</zoom>
</zooms>
In this case you can use Xpath to fetch all zoom nodes and get a random node from the list.
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$zooms = $xpath->evaluate('//zoom');
$zoom = $zooms->item(mt_rand(0, $zooms->length - 1));
var_dump(
[
'user' => $zoom->getAttribute('user'),
'zoom' => $zoom->textContent
]
);
Your main issue is that you are not returning any value when there is no zoom found.
$datas_zoom=zoom($r,$n,$rand); // no return keyword here!
When you're using recursion, you usually want to "chain" return values on and on, till you find the one you need. $datas_zoom is not a global variable and it will not "leak out" outside of your function. Please read the php's variable scope documentation for more info.
Then again, you're calling zoom function with three arguments ($r,$n,$rand) while the function can only handle two ($n and $rand). Also the $r is undiefined, $n is not used at all and you are most likely trying to use the same $rand value again and again, which obviously cannot work.
Also note that there are too many closing braces in your code.
I think the best approach for your problem will be to shuffle the array and then to use it like FIFO without recursion (which should be slightly faster):
function zoom($tab_users) {
// shuffle an array once
shuffle($tab_users);
// init variable
$txt_zoom = null;
// repeat until zoom is found or there
// are no more elements in array
do {
$rand = array_pop($tab_users);
$datas_user = new SimpleXMLElement($rand, null, true);
$tag=$datas_user->xpath('/user');
//if zoom found
if($tag[0]->zoom !='') {
$txt_zoom=$tag[0]->zoom;
}
} while(!$txt_zoom && !empty($tab_users));
return $txt_zoom;
}
$datas_zoom = zoom($tab_users); // your zoom is here!
Please read more about php scopes, php functions and recursion.
There's no reason for recursion. A simple loop would do.
$datas_user=new SimpleXMLElement($tab_users[$rand],null,true);
$tag=$datas_user->xpath('/user');
$max = $tag->length;
while(true) {
$test_index = rand(0, $max);
if ($tag[$test_index]->zoom != "") {
break;
}
}
Of course, you might want to add a bit more logic to handle the case where NO zooms have text set, in which case the above would be an infinite loop.

Unable to catch XML exception/error in PHP

I am not able to catch exception with below code. Can anyone help me with this thing?
try
{
$xml_emp_name = $xpath->evaluate("//EMPLOYEES[ID='" . $emp_id . "']/EMP-NAME/text()")->item(0)->nodeValue;
}
catch(Exception $e)
{
echo "Error: " . $e->getMessage();
}
DOMXPath::evaulate does not throw exceptions. domxpath evaluate
If the expression is malformed or the contextnode is invalid, DOMXPath::evaluate() returns FALSE.
Try
$xml_emp_name = $xpath->evaluate("//EMPLOYEES[ID='" . $emp_id . "']/EMP-NAME/text()");
if(!$xml_emp_name){
echo 'Error';
}else{
$name = $xml_emp_name->item(0)->nodeValue;
}
You try to access a property on a non-object if evaulate fails and returns false.
Your code is prone to xpath injection. Fix that first. The error then goes away automatically (because the xpath can not become syntactically invalid). Also you need to check/validate return values.
So you're missing the basic principles of input validation and return value validation. All you need to do is to take more care.
Input validation:
You directly inject the variable $emp_id into the xpath string for substitution:
"//EMPLOYEES[ID='" . $emp_id . "']/EMP-NAME/text()"
However at that place you can not have a single quote inside that string. Instead check the input value (Validation) or filter/streamline it (Sanitization). For exampe, validate that it does not contain a single quote or sanitize for a numeric value. Here the second:
$expression = sprintf('//EMPLOYEES[ID="%d"]/EMP-NAME/text()', $emp_id);
$result = $xpath->evaluate($expression);
This little call to sprintf() takes care that only numeric integer values are being used. They never contain quotes, so the expression is always valid. Invalid values that are no number will become 0. As it's the general principle to never assign the ID 0 this should normally not cause any issue in a well designed system. If you want to do the filtering more granular please see Data Filtering in the PHP manual.
return value validation
In your code you just take over the return value of the result with very little checks (actually no checks). That is wrong. For each method or function you use you need to look it up in the PHP manual and check the documentation for all possible return values. Here the method is DOMXpath::evaluate(), click the link and locate the Return Values section. You find this for each method and function in the PHP manual.
When you read the documentation also figure out which kind of error-handling a method makes use of. Does it throws exceptions (and if yes, which ones?) or does it show an error-condition with it's return value (like in your case)? This information is needed to decide whether to do try/catch as you did (and which is wrong because it does not throw exceptions) or if you need to check the return value:
$expression = sprintf('//EMPLOYEES[ID="%d"]/EMP-NAME/text()', $emp_id);
$result = $xpath->evaluate($expression);
if (!$result) {
throw new Exception(
sprintf('No such employee (id: %s)', var_export($emp_id, true))
);
}
This example turns a falsy return value into an exception with an individual exception message. You also might want to consider a different exception, the SPL offers some pre-defined exceptions.
I hope this answer helps you to deal with this issue and forthcoming ones.
->evaluate will for some reason not throw any exceptions, so what I would advise is to check if the result is false, then throw an exception:
if (($xml_emp_name = $xpath->evaluate("//EMPLOYEES[ID='" . $emp_id . "']/EMP-NAME/text()") ) !== false) {
$xml_emp_name = $xml_emp_name->item(0)->nodeValue;
}
else {
// Throw Exception
}

simple_xml_load_string if nothing is returned

I am attempting to only run a loop if xml results actually exist. I am getting the xml results via:
$albums = simplexml_load_string(curl_get($api_url . '/videos.xml'));
What I want to be able to do is that on the next line say:
if($albums = hasAValue())
// Loop
Any ideas? Or a way to check before I load the XML data?
Side note: This is using the Vimeo API.
No, you need to further go down with the resultant with the namespace, reach till body give the xpath and work on.
$albums->registerXPathNamespace('soap', 'http://schemas.xmlsoap.org/soap/envelope/');
To be specific, let me know the XML response you are getting i will let you the output.
UPDATED
$albums = simplexml_load_string("#your response#");
echo count($xml->children());
The dirty way:
$albums = #simplexml_load_string(curl_get($api_url . '/videos.xml'));
if ($albums)
{
...
}
This is dirty because of the Error Control Operator # which is used to "deal" with the error cases (e.g. problem fetching the remote location).
The alternative is to differentiate more here:
$xml = curl_get($api_url . '/videos.xml');
$albums = NULL;
if ($xml)
{
$albums = simplexml_load_string($xml);
}
if ($albums)
{
...
}

finding children in php simplexml xpath

I am running an xpath query on an xml stream and retreiving a set of data. In that i need to find the tag name. But im not able to find the way to retrieve the tag name. The xml stream is
<Condition>
<Normal dataItemId="Xovertemp-06" timestamp="2011-09-02T03:35:34.535703Z" name="Xovertemp" sequence="24544" type="TEMPERATURE"/>
<Normal dataItemId="Xservo-06" timestamp="2011-09-02T03:35:34.535765Z" name="Xservo" sequence="24545" type="LOAD"/>
<Normal dataItemId="Xtravel-06" timestamp="2011-09-02T03:35:34.535639Z" name="Xtravel" sequence="24543" type="POSITION"/>
</Condition>
I am trying to parse this as
Temperature = Normal
Load - Normal
So what i did is
foreach ($xml->xpath("//n:Condition")->children("n") as $child) {
echo $child["type"] . "=" . $child->getName();
}
I am getting the followin error
Fatal error: Call to a member function children() on a non-object in C:\xampp\htdocs\DataDumper\datadumper\test.php on line 53
Now i know this has got something to do with the way i query the xpath or something and i tried various combination such as adding an * slash to the query but the same error every time.
Not sure why you used namespace notaion in the first place(the sample xml is not namespaced)
In your xpath, you need to select all condition/normal tags, not the condition tag as you were doing...
Also, xpath() returns a list, so foreach over it. You don't need to access it as children, unless you want to parse the children of $child. There it would make sense, and it would work as expected.
foreach ($xml->xpath("/Condition/Normal") as $child) {
echo $child["type"] . "=" . $child->getName()."<br/>";
}
outputs
TEMPERATURE=Normal
LOAD=Normal
POSITION=Normal
The problem is due to SimpleXMLElement::xpath() returning an array and not a SimpleXMLElement. I'm also not sure about the namespace support in the XPath query however I'm sure you can fiddle with that to work it out. In any case, I see no n namespace in your XML.
The answer really depends on how many elements you expect to match your XPath query. If only one, try
$conditions = $xml->xpath('//Condition');
if (count($conditions) == 0) {
throw new Exception('No conditions found');
}
$condition = $conditions[0];
foreach ($condition->children() as $child) {
printf('%s = %s', (string) $child['type'], $child->getName());
}

Parsing XML with PHP (simplexml)

Firstly, may I point out that I am a newcomer to all things PHP so apologies if anything here is unclear and I'm afraid the more layman the response the better. I've been having real trouble parsing an xml file in to php to then populate an HTML table for my website. At the moment, I have been able to get the full xml feed in to a string which I can then echo and view and all seems well. I then thought I would be able to use simplexml to pick out specific elements and print their content but have been unable to do this.
The xml feed will be constantly changing (structure remaining the same) and is in compressed format. From various sources I've identified the following commands to get my feed in to the right format within a string although I am still unable to print specific elements. I've tried every combination without any luck and suspect I may be barking up the wrong tree. Could someone please point me in the right direction?!
$file = fopen("compress.zlib://$url", 'r');
$xmlstr = file_get_contents($url);
$xml = new SimpleXMLElement($url,null,true);
foreach($xml as $name) {
echo "{$name->awCat}\r\n";
}
Many, many thanks in advance,
Chris
PS The actual feed
Since no one followed my closevote, I think I can just as well put my own comments as an answer:
First of all, SimpleXml can load URIs directly and it can do so with stream wrappers, so your three calls in the beginning can be shortened to (note that you are not using $file at all)
$merchantProductFeed = new SimpleXMLElement("compress.zlib://$url", null, TRUE);
To get the values you can either use the implicit SimpleXml API and drill down to the wanted elements (like shown multiple times elsewhere on the site):
foreach ($merchantProductFeed->merchant->prod as $prod) {
echo $prod->cat->awCat , PHP_EOL;
}
or you can use an XPath query to get at the wanted elements directly
$xml = new SimpleXMLElement("compress.zlib://$url", null, TRUE);
foreach ($xml->xpath('/merchantProductFeed/merchant/prod/cat/awCat') as $awCat) {
echo $awCat, PHP_EOL;
}
Live Demo
Note that fetching all $awCat elements from the source XML is rather pointless though, because all of them have "Bodycare & Fitness" for value. Of course you can also mix XPath and the implict API and just fetch the prod elements and then drill down to the various children of them.
Using XPath should be somewhat faster than iterating over the SimpleXmlElement object graph. Though it should be noted that the difference is in an neglectable area (read 0.000x vs 0.000y) for your feed. Still, if you plan to do more XML work, it pays off to familiarize yourself with XPath, because it's quite powerful. Think of it as SQL for XML.
For additional examples see
A simple program to CRUD node and node values of xml file and
PHP Manual - SimpleXml Basic Examples
Try this...
$url = "http://datafeed.api.productserve.com/datafeed/download/apikey/58bc4442611e03a13eca07d83607f851/cid/97,98,142,144,146,129,595,539,147,149,613,626,135,163,168,159,169,161,167,170,137,171,548,174,183,178,179,175,172,623,139,614,189,194,141,205,198,206,203,208,199,204,201,61,62,72,73,71,74,75,76,77,78,79,63,80,82,64,83,84,85,65,86,87,88,90,89,91,67,92,94,33,54,53,57,58,52,603,60,56,66,128,130,133,212,207,209,210,211,68,69,213,216,217,218,219,220,221,223,70,224,225,226,227,228,229,4,5,10,11,537,13,19,15,14,18,6,551,20,21,22,23,24,25,26,7,30,29,32,619,34,8,35,618,40,38,42,43,9,45,46,651,47,49,50,634,230,231,538,235,550,240,239,241,556,245,244,242,521,576,575,577,579,281,283,554,285,555,303,304,286,282,287,288,173,193,637,639,640,642,643,644,641,650,177,379,648,181,645,384,387,646,598,611,391,393,647,395,631,602,570,600,405,187,411,412,413,414,415,416,649,418,419,420,99,100,101,107,110,111,113,114,115,116,118,121,122,127,581,624,123,594,125,421,604,599,422,530,434,532,428,474,475,476,477,423,608,437,438,440,441,442,444,446,447,607,424,451,448,453,449,452,450,425,455,457,459,460,456,458,426,616,463,464,465,466,467,427,625,597,473,469,617,470,429,430,615,483,484,485,487,488,529,596,431,432,489,490,361,633,362,366,367,368,371,369,363,372,373,374,377,375,536,535,364,378,380,381,365,383,385,386,390,392,394,396,397,399,402,404,406,407,540,542,544,546,547,246,558,247,252,559,255,248,256,265,259,632,260,261,262,557,249,266,267,268,269,612,251,277,250,272,270,271,273,561,560,347,348,354,350,352,349,355,356,357,358,359,360,586,590,592,588,591,589,328,629,330,338,493,635,495,507,563,564,567,569,568/mid/2891/columns/merchant_id,merchant_name,aw_product_id,merchant_product_id,product_name,description,category_id,category_name,merchant_category,aw_deep_link,aw_image_url,search_price,delivery_cost,merchant_deep_link,merchant_image_url/format/xml/compression/gzip/";
$zd = gzopen($url, "r");
$data = gzread($zd, 1000000);
gzclose($zd);
if ($data !== false) {
$xml = simplexml_load_string($data);
foreach ($xml->merchant->prod as $pr) {
echo $pr->cat->awCat . "<br>";
}
}
<?php
$xmlstr = file_get_contents("compress.zlib://$url");
$xml = simplexml_load_string($xmlstr);
// you can transverse the xml tree however you want
foreach ($xml->merchant->prod as $line) {
// $line->cat->awCat -> you can use this
}
more information here
Use print_r($xml) to see the structure of the parsed XML feed.
Then it becomes obvious how you would traverse it:
foreach ($xml->merchant->prod as $prod) {
print $prod->pId;
print $prod->text->name;
print $prod->cat->awCat; # <-- which is what you wanted
print $prod->price->buynow;
}
$url = 'you url here';
$f = gzopen ($url, 'r');
$xml = new SimpleXMLElement (fread ($f, 1000000));
foreach($xml->xpath ('//prod') as $name)
{
echo (string) $name->cat->awCatId, "\r\n";
}

Categories