Parsing XLM with PHP - php

I have an XML product feed that I am parsing with PHP to load products into a database.
I need to get each element into an array of $products = array() such as:
$products[AttributeID] = value
This is what I have so far:
I am using simplexml and I have got most of it:
$xml = simplexml_load_file($CatalogFileName) or die("can't open file " . $CatalogFileName);
foreach($xml->children() as $products) {
foreach($products->children() as $product) {
$new_product = array();
$new_product[sku] = "AS-" . $product->Name;
foreach($product->Values->Value as $node) {
$node_name = preg_replace('/\s+/', '', $node[AttributeID]);
$new_product[$node_name] = $node[0]; **<--THIS IS NOT WORKING: $node[0] returns an array I only want the data in each attribute.**
}
foreach($product->AssetCrossReference as $node) {
$new_product[image] = "http://www.xxxxxxxx.com/images/items/fullsize/" . $node[AssetID] . ".jpg";
}
print_r($new_product);
}
}
Here is an image of one product node: XML
Can someone provide me with a little help here? I do a lot of PHP programming but this is the first time I am dealing with XML

A possbile solution is to use the xpath method to find the <Value> elements.
The xpath method will return an array of SimpleXMLElement objects.
These SimpleXMLElement objects have an attributes method which you can use to access the 'AttributeID' attribute.
I hope this example can help you:
$xml = simplexml_load_file($CatalogFileName) or die("can't open file " . $CatalogFileName);
$productsArray = array();
foreach($xml->children() as $products) {
foreach($products->children() as $product) {
$new_product = array();
$productId = (string)$product->attributes()->ID;
$new_product['NAME'] = "AS-" . (string)$product->Name;
// xpath will return an array of SimpleXMLElement objects
$values = $product->Values->xpath('//Value');
// use the attributes method to access the AttributeID
foreach ($values as $node) {
$attributeID = (string)$node->attributes()->AttributeID;
$new_product[$attributeID] = trim((string)$node);
}
$new_product['AssetID'] = sprintf(
"http://www.xxxxxxxx.com/images/items/fullsize/%s.jpg",
(string)$product->AssetCrossReference->attributes()->AssetID
);
// Add $new_product to $productsArray using the $productId as the array key
$productsArray[$productId] = $new_product;
}
}

Related

I am trying to scrap website but get only one array detail in xml file

I am trying to scrape this webpage. In this webpage I have to get the job title and its location. Which I am able to get from my code. But the problem is coming that when I am sending it in XML, then only one detail is going from the array list.
I am using goutte CSS selector library and also please tell me how to scrap pagination in goutte CSS selector library.
here is my code:
$httpClient = new \Goutte\Client();
$response = $httpClient->request('GET', 'https://www.simplyhired.com/search?q=pharmacy+technician&l=American+Canyon%2C+CA&job=X5clbvspTaqzIHlgOPNXJARu8o4ejpaOtgTprLm2CpPuoeOFjioGdQ');
$job_posting_location = [];
$response->filter('.LeftPane article .SerpJob-jobCard.card .jobposting-subtitle span.JobPosting-labelWithIcon.jobposting-location span.jobposting-location')
->each(function ($node) use (&$job_posting_location) {
$job_posting_location[] = $node->text() . PHP_EOL;
});
$joblocation = 0;
$response->filter('.LeftPane article .SerpJob-jobCard.card .jobposting-title-container h3 a')
->each( function ($node) use ($job_posting_location, &$joblocation, $httpClient) {
$job_title = $node->text() . PHP_EOL; //job title
$job_posting_location = $job_posting_location[$joblocation]; //job posting location
// display the result
$items = "{$job_title} # {$job_posting_location}\n\n";
global $results;
$result = explode('#', $items);
$results['job_title'] = $result[0];
$results['job_posting_location'] = $result[1];
$joblocation++;
});
function convertToXML($results, &$xml_user_info){
foreach($results as $key => $value){
if(is_array($value)){
$subnode = $xml_user_info->addChild($key);
foreach ($value as $k=>$v) {
$xml_user_info->addChild("$k",htmlspecialchars("$v"));
}
}else{
$xml_user_info->addChild("$key",htmlspecialchars("$value"));
}
}
return $xml_user_info->asXML();
}
$xml_user_info = new SimpleXMLElement('<root/>');
$xml_content = convertToXML($results,$xml_user_info);
$xmlFile = 'details.xml';
$handle = fopen($xmlFile, 'w') or die('Unable to open the file: '.$xmlFile);
if(fwrite($handle, $xml_content)) {
echo 'Successfully written to an XML file.';
}
else{
echo 'Error in file generating';
}
what i got in xml file --
<?xml version="1.0"?>
<root><job_title>Pharmacy Technician
</job_title><job_posting_location> Vallejo, CA
</job_posting_location></root>
what i want in xml file --
<?xml version="1.0"?>
<root>
<job_title>Pharmacy Technician</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
<job_title>Pharmacy Technician 1</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
<job_title>Pharmacy Technician New</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
and so on...
</root>
You overwrite the values in the $results variable. You're would need to do something like this to append:
$results[] = [
'job_title' => $result[0];
'job_posting_location' => $result[1]
];
However here is no need to put the data into an array at all, just create the
XML directly with DOM.
Both your selectors share the same start. Iterate the card and then fetch
related data.
$httpClient = new \Goutte\Client();
$response = $httpClient->request('GET', $url);
$document = new DOMDocument();
// append document element node
$postings = $document->appendChild($document->createElement('jobs'));
// iterate job posting cards
$response->filter('.LeftPane article .SerpJob-jobCard.card')->each(
function($jobCard) use ($document, $postings) {
// fetch data
$location = $jobCard
->filter(
'.jobposting-subtitle span.JobPosting-labelWithIcon.jobposting-location span.jobposting-location'
)
->text();
$title = $jobCard->filter('.jobposting-title-container h3 a')->text();
// append 'job' node to group data in result
$job = $postings->appendChild($document->createElement('job'));
// append data nodes
$job->appendChild($document->createElement('job_title'))->textContent = $title;
$job->appendChild($document->createElement('job_posting_location'))->textContent = $location;
}
);
echo $document->saveXML();

How to loop through two XML files and print result

I've been trying unsuccessfully with PHP to loop through two XML files and print the result to the screen. The aim is to take a country's name and output its regions/states/provinces as the case may be.
The first block of code successfully prints all the countries but the loop through both files gives me a blank screen.
The countries file is in the format:
<row>
<id>6</id>
<name>Andorra</name>
<iso2>AD</iso2>
<phone_code>376</phone_code>
</row>
And the states.xml:
<row>
<id>488</id>
<name>Andorra la Vella</name>
<country_id>6</country_id>
<country_code>AD</country_code>
<state_code>07</state_code>
</row>
so that country_id = id.
This gives a perfect list of countries:
$xml = simplexml_load_file("countries.xml");
$xml1 = simplexml_load_file("states.xml");
foreach($xml->children() as $key => $children) {
print((string)$children->name); echo "<br>";
}
This gives me a blank screen except for the HTML stuff on the page:
$xml = simplexml_load_file("countries.xml");
$xml1 = simplexml_load_file("states.xml");
$s = "Jamaica";
foreach($xml->children() as $child) {
foreach($xml1->children() as $child2){
if ($child->id == $child2->country_id && $child->name == $s) {
print((string)$child2->name);
echo "<br>";
}
}
}
Where have I gone wrong?
Thanks.
I suspect your problem is not casting the name to a string before doing your comparison. But why are you starting the second loop before checking if it's needed? You're looping through every single item in states.xml needlessly.
$countries = simplexml_load_file("countries.xml");
$states = simplexml_load_file("states.xml");
$search = "Jamaica";
foreach($countries->children() as $country) {
if ((string)$country->name !== $search) {
continue;
}
foreach($states->children() as $state) {
if ((string)$country->id === (string)$state->country_id) {
echo (string)$state->name . "<br/>";
}
}
}
Also, note that naming your variables in a descriptive manner makes it much easier to figure out what's going on with code.
You could probably get rid of the loops altogether using an XPath query to match the sibling value. I don't use SimpleXML, but here's what it would look like with DomDocument:
$search = "Jamaica";
$countries = new DomDocument();
$countries->load("countries.xml");
$xpath = new DomXPath($countries);
$country = $xpath->query("//row[name/text() = '$search']/id/text()");
$country_id = $country[0]->nodeValue;
$states = new DomDocument();
$states->load("states.xml");
$xpath = new DomXPath($states);
$states = $xpath->query("//row[country_id/text() = '$country_id']/name/text()");
foreach ($states as $state) {
echo $state->nodeValue . "<br/>";
}

Retrieving all google tags xml

Im parsing a xml file but im having some issues regarding a tag (":g"), i cant access the information, his content, the problem is when i try to get the categories, i have more than one category.
xml:
<item>
<g:id>4011700742288</g:id>
<title><![CDATA[4711 Acqua Colonia Blood Orange & Basil Eau de Cologne 170ml]]></title>
<link><![CDATA[https://url/asdasd.html]]></link>
<g:image_link><![CDATA[https://url/media/catalog/product/4/7/4711-acqua-colonia-blood-_2.jpg]]></g:image_link>
<g:price>34.86 EUR</g:price>
<g:product_type><![CDATA[Mulher]]></g:product_type>
<g:product_type><![CDATA[Homem]]></g:product_type>
<g:product_type><![CDATA[Unisexo]]></g:product_type>
</item>
I try getting the categories using for example:
$categories = $item->children('g', TRUE)->product_type;
But it only brings the first category, is not geting the rest of the categories.
Here above is my code example of how i get the data.
ex:
foreach($rss->channel->item as $item) {
$categories = $item->children('g', TRUE)->product_type;
// bringing in to array <content:encoded> items from SimpleXMLElement Object()
$content = xmlObjToArr($item->children('content', true)->encoded);
echo $categories . PHP_EOL;
return;
}
function xmlObjToArr($obj) {
$namespace = $obj->getDocNamespaces(true);
$namespace[NULL] = NULL;
$children = array();
$attributes = array();
$name = strtolower((string)$obj->getName());
$text = trim((string)$obj);
if( strlen($text) <= 0 ) {
$text = NULL;
}
// get info for all namespaces
if(is_object($obj)) {
foreach( $namespace as $ns=>$nsUrl ) {
// atributes
$objAttributes = $obj->attributes($ns, true);
foreach( $objAttributes as $attributeName => $attributeValue ) {
$attribName = strtolower(trim((string)$attributeName));
$attribVal = trim((string)$attributeValue);
if (!empty($ns)) {
$attribName = $ns . ':' . $attribName;
}
$attributes[$attribName] = $attribVal;
}
// children
$objChildren = $obj->children($ns, true);
foreach( $objChildren as $childName=>$child ) {
$childName = strtolower((string)$childName);
if( !empty($ns) ) {
$childName = $ns.':'.$childName;
}
$children[$childName][] = xmlObjToArr($child);
}
}
}
return array(
'name'=>$name,
'text'=>$text,
'attributes'=>$attributes,
'children'=>$children
);
}
Your code is correct.
$categories = $item->children('g', TRUE)->product_type;
This will set $categories to an object which gives you access to all the <g:product_type> elements.
Your problem is when you write:
echo $categories . PHP_EOL;
This displays the text content of a single XML element. Since $categories is a collection of multiple elements, SimpleXML guesses that you want the first one. In other words, it's equivalent to:
echo (string)$categories[0] . PHP_EOL;
Where (string) extracts the text content and is implied by echo, and [0] gets the first item in the collection.
Looping over the collection of elements works exactly how you'd expect a list to work - you use foreach:
foreach ( $categories as $cat ) {
echo $cat . PHP_EOL;
}

Finding a value in XML (from a parameter) PHP

I am looking for a way to find the value of a parameter 'duration' from an XML access
Source :
http://gdata.youtube.com/feeds/api/videos/bSCs7NzghSg
I tried something like :
preg_match('#<yt:duration seconds=\'(.*?)\'/>#is',$xml,$resultduration);
$duration = $resultTitre[count($resultduration)-1];
but value return 0
This can be done with SimpleXML. Use the children() method for finding the children and attributes() method for accessing the attributes.
Here's a function for that purpose:
function getDuration($url) {
$xml = simplexml_load_file($url);
$duration = $xml->children('media', true)
->group->children('yt', true)
->duration->attributes('', true)->seconds;
return $duration;
}
Usage:
$url = 'http://gdata.youtube.com/feeds/api/videos/bSCs7NzghSg';
echo getDuration($url); // => 2461
Demo!
$file = 'http://gdata.youtube.com/feeds/api/videos/bSCs7NzghSg';
$xml = simplexml_load_file($file);
$result = $xml->xpath('//yt:duration');
foreach ($result as $node) {
echo (double) $node['seconds'];
echo ' ';
}

PHP how to count xml elements in object returned by simplexml_load_file(),

I have inherited some PHP code (but I've little PHP experience) and can't find how to count some elements in the object returned by simplexml_load_file()
The code is something like this
$xml = simplexml_load_file($feed);
for ($x=0; $x<6; $x++) {
$title = $xml->channel[0]->item[$x]->title[0];
echo "<li>" . $title . "</li>\n";
}
It assumes there will be at least 6 <item> elements but sometimes there are fewer so I get warning messages in the output on my development system (though not on live).
How do I extract a count of <item> elements in $xml->channel[0]?
Here are several options, from my most to least favourite (of the ones provided).
One option is to make use of the SimpleXMLIterator in conjunction with LimitIterator.
$xml = simplexml_load_file($feed, 'SimpleXMLIterator');
$items = new LimitIterator($xml->channel->item, 0, 6);
foreach ($items as $item) {
echo "<li>{$item->title}</li>\n";
}
If that looks too scary, or not scary enough, then another is to throw XPath into the mix.
$xml = simplexml_load_file($feed);
$items = $xml->xpath('/rss/channel/item[position() <= 6]');
foreach ($items as $item) {
echo "<li>{$item->title}</li>\n";
}
Finally, with little change to your existing code, there is also.
$xml = simplexml_load_file($feed);
for ($x=0; $x<6; $x++) {
// Break out of loop if no more items
if (!isset($xml->channel[0]->item[$x])) {
break;
}
$title = $xml->channel[0]->item[$x]->title[0];
echo "<li>" . $title . "</li>\n";
}
The easiest way is to use SimpleXMLElement::count() as:
$xml = simplexml_load_file($feed);
$num = $xml->channel[0]->count();
for ($x=0; $x<$num; $x++) {
$title = $xml->channel[0]->item[$x]->title[0];
echo "<li>" . $title . "</li>\n";
}
Also note that the return of $xml->channel[0] is a SimpleXMLElement object. This class implements the Traversable interface so we can use it directly in a foreach loop:
$xml = simplexml_load_file($feed);
foreach($xml->channel[0] as $item {
$title = $item->title[0];
echo "<li>" . $title . "</li>\n";
}
You get count by count($xml).
I always do it like this:
$xml = simplexml_load_file($feed);
foreach($xml as $key => $one_row) {
echo $one_row->some_xml_chield;
}

Categories