passing more then one path to xPath query - php

I have links with a different paths, and trying retrieve data from those links. So I don't want to do it separate. Made a query list, and used foreach on that list.
function passPath($list){
$list = [
"//li[#class='out']/a[1]",
"//ul[#class='ul right_ul clearfix']/li[2]/a",
"//ul[#class='ul right_ul clearfix']/li[2]/a"
];
foreach($list as $val){
return $val;
}
}
Then used that function inside DOMXpath's query.
function getPath($urls){
foreach($urls as $k => $val){
$url = $urls;
$html = content($val);
$path = new \DOMXPath($html);
$xPath = passPath($val);
$route = $path->query($xPath);
foreach($route as $value){
if ($value->nodeValue != false) {
$urls [] = trim($value->getAttribute('href'));
unset($urls[$k]);
}
}
}
return array_unique($urls);
}
it's running without an error. But there is foreach problem here. because it's just retrieving one element's data. not keep going other elements... What I am missing here?
$data = getPath($urls)
var_dump($data)
by the way: content() is file_get_content/loadHTML function.

I changed your code for earning list of href.
# You want to parse all pages using url list. So you created function named `getPath($urls)`.
function getPath($urls) {
# I suggest you'd rather declare $ret for storing values to return.
$ret = [];
# Using foreach, you can parse all url.
foreach ($urls as $k => $url) { # $val is url value of $urls. And I changed $val to $url.
# content() is file_get_content/loadHTML function.
$html = content($url);
# Create new DOMXPath object using $html.
$path = new \DOMXPath($html);
# This function is not required.
# By the way, second element and third element of $xPathList are equal. I think the third element is not required.
// $xPath = passPath($url);
$xPathList = [
"//li[#class='out']/a[1]",
"//ul[#class='ul right_ul clearfix']/li[2]/a",
"//ul[#class='ul right_ul clearfix']/li[2]/a"
];
foreach ($xPathList as $xPath) {
$nodes = $path->query($xPath);
foreach ($nodes as $node) {
if ($node->nodeValue != false) {
$ret[] = trim($node->getAttribute('href'));
}
}
}
}
return array_unique($ret);
}
$data = getPath($urls);
var_dump($data);

Related

Show only a specific part of html response in php

I am trying to get tracking information from amazon using provided url
https://www.amazon.co.uk/progress-tracker/package/ref=pe_3187911_189395841_TE_typ?_encoding=UTF8&from=gp&itemId=&orderId=203-2171364-3066749&packageIndex=0&shipmentId=23796758607302
I am getting response using file_get_contents() function in php,
what I want is to show only that part of the response which contains the tracking information as an output of my php script and eliminate/hide all the unnecessary content from file_get_contents() response.
One way to do what you're looking for is use DomDocument to filter out the json data in the source ($file) and then use a recursive function to get the elements you need.
You can set the elements you need using an array, $filter. In this example we've taken a sample of some of the available data, i.e. :
$filter = [
'orderId', 'shortStatus', 'promiseMessage',
'lastTransitionPercentComplete', 'lastReachedMilestone', 'shipmentId',
];
The code
<?php
$filename = 'https://www.amazon.co.uk/progress-tracker/package/ref=pe_3187911_189395841_TE_typ?_encoding=UTF8&from=gp&itemId=&orderId=203-2171364-3066749&packageIndex=0&shipmentId=23796758607302';
$file = file_get_contents($filename);
$trackingData = []; // store for order tracking data
$html = new DOMDocument();
#$html->loadHTML($file);
foreach ($html->getElementsByTagName('script') as $a) {
$data = $a->textContent;
if (stripos($data, 'shortStatus') !== false) {
$trackingData = json_decode($data, true);
break;
}
}
// set the items we need
$filter = [
'orderId', 'shortStatus', 'promiseMessage',
'lastTransitionPercentComplete', 'lastReachedMilestone', 'shipmentId',
];
// invoke recursive function to pick up the data items specified in $filter
$result = getTrackingData($filter, $trackingData);
echo '<pre>';
print_r($result);
echo '</pre>';
function getTrackingData(array $filter, array $data, array &$result = []) {
foreach($data as $key => $value) {
if(is_array($value)) {
getTrackingData($filter, $value, $result);
} else {
foreach($filter as $item) {
if($item === $key) {
$result[$key] = $value;
}
}
}
}
return $result;
}
Output:
Array
(
[orderId] => 203-2171364-3066749
[shortStatus] => IN_TRANSIT
[promiseMessage] => Arriving tomorrow by 9 PM
[lastTransitionPercentComplete] => 92
[lastReachedMilestone] => SHIPPED
[shipmentId] => 23796758607302
)
Try this
<?php
$filename = 'https://www.amazon.co.uk/progress-tracker/package/ref=pe_3187911_189395841_TE_typ?_encoding=UTF8&from=gp&itemId=&orderId=203-2171364-3066749&packageIndex=0&shipmentId=23796758607302';
$file = file_get_contents($filename);
$html = new DOMDocument();
#$html->loadHTML($file);
foreach($html->getElementsByTagName('span') as $a) {
$property=$a->getAttribute('id');
if (strpos($property , "primaryStatus"))
print_r($property);
}
?>
It should show "Arriving tomorrow by 9 PM" status.

Adding additional string to variable - PHP

Here's my problem. I want to create a function that takes an outside variable that contains an xpath and once the function runs I want to add to that same variable to create a counter.
So I have the outside variable:
$node = $xmlDoc->xpath('//a:Order');
Then the function with a single argument that will take the outside variable ($node). Like so:
function loopXML($node) {
i=1; //counter variable
}
Now I want to add a counter to $node so that it goes through all of the children of "Order". Outside of the function, I would use:
$child = $xmlDoc->xpath('//a:Order['.$i.']/*');
But inside of the function, I have no idea how to concat it. Does anyone have any idea how I could do this?
EDIT:
Also, it should be noted that I created an arbitrary namespace already:
foreach($xmlDoc->getDocNamespaces() as $strPrefix => $strNamespace) {
if(strlen($strPrefix)==0) {
$strPrefix="a"; //Assign an arbitrary namespace prefix.
}
$xmlDoc->registerXPathNamespace($strPrefix,$strNamespace);
}
SimpleXMLElement::xpath() uses the node associated with the SimpleXML element as the context so you can do something like:
foreach ($xmlDoc->xpath('//a:Order') as $order) {
foreach ($order->xpath('*') as $field) {
...
}
}
But SimpleXMLElement::children() is a list of the element child nodes so it returns the same as the Xpath expression * or to be more exact '*[namespace-uri == ""]'. The first argument is the namespace of the children you would like to fetch.
foreach ($xmlDoc->xpath('//a:Order') as $order) {
foreach ($order->children() as $field) {
...
}
}
This can be easily refactored into a function.
function getRecord(SimpleXMLelement $order, $namespace) {
$result = [];
foreach ($order->children($namespace) as $field) {
$result[$field->getName()] = (string)$field;
}
return $result;
}
You should always depend on the actual namespace, never on the prefix. Prefixes can change and are optional.
Put all together:
$xml = <<<'XML'
<a:orders xmlns:a="urn:a">
<a:order>
<a:foo>bar</a:foo>
<a:answer>42</a:answer>
</a:order>
</a:orders>
XML;
$namespace = 'urn:a';
$orders = new SimpleXMLElement($xml);
$orders->registerXpathNamespace('a', $namespace);
function getRecord(SimpleXMLelement $order, $namespace = NULL) {
$result = [];
foreach ($order->children($namespace) as $field) {
$result[$field->getName()] = (string)$field;
}
return $result;
}
foreach ($orders->xpath('//a:order') as $order) {
var_dump(getRecord($order, $namespace));
}
Output:
array(2) {
["foo"]=>
string(3) "bar"
["answer"]=>
string(2) "42"
}
So I figured it out with a lot of Googling and the help of ThW. So to all that helped, thank you. Here's how I got it to work:
$orderPNode = '//a:Order';
$amazonRawXML = 'AmazonRaw.xml';
$amazonRawCSV = 'AmazonRaw.csv';
function loopXML($xmlDoc, $node, $writeCsv) {
$i = 1;
$xmlDocs = simplexml_load_file($xmlDoc);
$result = [];
foreach($xmlDocs->getDocNamespaces() as $strPrefix => $strNamespace) {
if(strlen($strPrefix)==0) {
$strPrefix="a"; //Assign an arbitrary namespace prefix.
}
$xmlDocs->registerXPathNamespace($strPrefix,$strNamespace);
}
file_put_contents($writeCsv, ""); // Clear contents of csv file after each go
$nodeP = $xmlDocs->xpath($node);
foreach ($nodeP as $n) {
$nodeC = $xmlDocs->xpath($node.'['.$i.']/*');
if($nodeC) {
foreach ($nodeC as $value) {
$values[] = $value;
}
$write = fopen($writeCsv, 'a');
fputcsv($write, $values);
fclose($write);
$values = [];
$i++;
} else {
$result[] = $n;
$i++;
}
}
return $result;
}
loopXML($amazonRawXML, $orderPNode, $amazonRawCSV);

PHP renaming string if string already exists

I am storing some data in an array and I want to add the key to it if the title already exists in the array. But for some reason it's not adding the key to the title.
Here's my loop:
$data = [];
foreach ($urls as $key => $url) {
$local = [];
$html = file_get_contents($url);
$crawler = new Crawler($html);
$headers = $crawler->filter('h1.title');
$title = $headers->text();
$lowertitle = strtolower($title);
if (in_array($lowertitle, $local)) {
$lowertitle = $lowertitle.$key;
}
$local = [
'title' => $lowertitle,
];
$data[] = $local;
}
echo "<pre>";
var_dump($data);
echo "</pre>";
You will not find anything here:
foreach ($urls as $key => $url) {
$local = [];
// $local does not change here...
// So here $local is an empty array
if (in_array($lowertitle, $local)) {
$lowertitle = $lowertitle.$key;
}
...
If you want to check if the title already exists in the $data array, you have a few options:
You loop over the whole array or use an array filter function to see if the title exists in $data;
You use the lowercase title as the key for your $data array. That way you can easily check for duplicate values.
I would use the second option or something similar to it.
A simple example:
if (array_key_exists($lowertitle, $data)) {
$lowertitle = $lowertitle.$key;
}
...
$data[$lowertitle] = $local;

Show XML tag full path with php

Let's assume we want to process this Feed: http://tools.forestview.eu/xmlp/xml_feed.php?aid=1094&cid=1000
I'm trying to show the nodes of an XML file this way:
deals->deal->dealsite
deals->deal->deal_id
deals->deal->deal_title
This is in order to be able to process feeds that we don't know what their XML tags are. So we will let the user choose that deals->deal->deal_title is the Deal Title and will recognize it that way.
I have been trying ages to do this with this code:
class HandleXML {
var $root_tag = false;
var $xml_tags = array();
var $keys = array();
function parse_recursive(SimpleXMLElement $element)
{
$get_name = $element->getName();
$children = $element->children(); // get all children
if (empty($this->root_tag)) {
$this->root_tag = $this->root_tag.$get_name;
}
$this->xml_tags[] = $get_name;
// only show children if there are any
if(count($children))
{
foreach($children as $child)
{
$this->parse_recursive($child); // recursion :)
}
}
else {
$key = implode('->', $this->xml_tags);
$this->xml_tags = array();
if (!in_array($key, $this->keys)) {
if (!strstr('>', $key) && count($this->keys) > 0) { $key = $this->root_tag.'->'.$key; }
if (!in_array($key, $this->keys)) {
$this->keys[] = $key;
}
}
}
}
}
$xml = new SimpleXMLElement($feed_url, null, true);
$handle_xml = new HandleXML;
$handle_xml->parse_recursive($xml);
foreach($handle_xml->keys as $key) {
echo $key.'<br />';
}
exit;
but here's what I get instead:
deals->deal->dealsite
deals->deal_id
deals->deal_title
See on 2nd and 3rd line the deal-> part is missing.
I have also tried with this code: http://pastebin.com/FkPWXF64 but it's definitely not the best way to go and it doesn't always work.
No matter how many times I couldn't do it.
In one of my sites I use a little different approach to handle xml feed. In your case it would look like:
$xml = simplexml_load_file("http://tools.forestview.eu/xmlp/xml_feed.php?aid=1094&cid=1000");
foreach($xml->{'deal'} as $deal)
{
$dealsite = $deal->{'dealsite'};
$dael_id = $deal->{'dael_id'};
$deal_title = $deal->{'deal_title'};
$deal_url = $deal->{'deal_url'};
$deal_city = $deal->{'deal_city'};
$deal_category = $deal->{'deal_category'};
// and so on for the rest
// do some stuff with the variables like insert into MySQL
}

How to parse web form fields with xPath?

In order to retrieve name/value pairs.
This should do it:
$dom = new DOMDocument;
$dom->load('somefile.html');
$xpath = new DOMXPath($dom);
$data = array();
$inputs = $xpath->query('//input');
foreach ($inputs as $input) {
if ($name = $input->getAttribute('name')) {
$data[$name] = $input->getAttribute('value');
}
}
$textareas = $xpath->query('//textarea');
foreach ($textareas as $textarea) {
if ($name = $textarea->getAttribute('name')) {
$data[$name] = $textarea->nodeValue;
}
}
$options = $xpath->query('//select/option[#selected="selected"]');
foreach ($options as $option) {
if ($name = $option->parentNode->getAttribute('name')) {
$data[$name] = $option->getAttribute('value');
}
}
Depending on whether or not you have multiple forms in your HTML, you may want to pass the second argument to query() to differentiate them, and add an extra loop.
You will have to tweak it a bit if you use array keys (e.g.: yourfield[]).

Categories