I have a php file that generates an xml feed containing a list of all the products of my e-commerce.
For each of the products, the php file is designed to assign a series of tags (price, title, description etc...). One of these tags is the link of the product on my e-commerce.
Example:
<title> smartphone </title>
<category> electronics </category>
<price> 100 </price>
<link> www.mysite.com/smartphone </link>
This is the part of the code (working) used to extract the link of the product (my e-commerce is made with prestashop):
$link = $item->addChild('LINK', $productobj->getLink();
The thing that I need to do, is to dynamically add some string at the end of the link generated for each product.
For example, if the link is:
www.mysite.com/product-name
I need the link to be like this:
www.mysite.com/product-name?parameter1=abc¶meter2=def¶meter3=ghi
And the parameters need to be adapted based on the value of the tag <category>.
Example:
if the product A has the tag <category>electronics</category>
the string to add at the end of its url would be:
?parameter1=electronics¶meter2=def¶meter3=ghi
if the product B has the tag <category>clothes</category>
the string to add at the end of its url would be:
?parameter1=clothes¶meter2=def¶meter3=ghi
Thanks in advance for your precious help.
Good evening.
EDIT:
I report below more detail of the code used to generate the xml file.
<?php
public function addChildWithCDATA($name, $value = NULL) {
$new_child = $this->addChild($name);
if ($new_child !== NULL) {
$node = dom_import_simplexml($new_child);
$no = $node->ownerDocument;
$node->appendChild($no->createCDATASection($value));
}
return $new_child;
}
}
require_once(dirname(__FILE__) . '/config/config.inc.php');
require_once(dirname(__FILE__) . '/init.php');
$config = Configuration::getMultiple(array('PS_LANG_DEFAULT', 'PS_COUNTRY_DEFAULT'));
$defaultLangId = $config['PS_LANG_DEFAULT'];
$defaultCountryId = $config['PS_COUNTRY_DEFAULT'];
Context::getContext()->language = new Language($defaultLangId);
Context::getContext()->link = new Link();
$products = Db::getInstance(_PS_USE_SQL_SLAVE_)->executeS($query);
$counter = 0;
if ($products) {
$itemlist = new SimpleXMLElementExtended('<?xml version="1.0" encoding="UTF-8"?><ITEMLIST></ITEMLIST>');
foreach ($products as $product) {
$item = $itemlist->addChild('ITEM');
$productobj = new Product($product['id_product'], false, $defaultLangId);
$counter++;
$item->addAttribute('ID', $productobj->id);
$item->addAttribute('CITYID', $defcityID);
$item->addAttribute('CATEGORYID', $product['CATEGORYID']);
$title = $item->addChildWithCDATA('TITLE',trim($productobj->name));
$title = $item->addChildWithCDATA('TEXT',strip_tags($productobj->description));
$email = $item->addChild('EMAIL', $defemail);
$link = $item->addChild('LINK', $productobj->getLink());
}
unset($products);
$success &= file_put_contents(_PS_ROOT_DIR_.'/feed.xml', $itemlist->asXML());
echo "DONE";
} else {
echo "NOTHING FOUND";
}
?>
If you are trying to add the category as a URL parameter, you could use:
$link = $item->addChild('LINK', $productobj->getLink() . "?parameter1=" . $product['CATEGORYID']);
Related
I am trying to scrape this webpage. In this webpage I have to get the job title and its location. Which I am able to get from my code. But the problem is coming that when I am sending it in XML, then only one detail is going from the array list.
I am using goutte CSS selector library and also please tell me how to scrap pagination in goutte CSS selector library.
here is my code:
$httpClient = new \Goutte\Client();
$response = $httpClient->request('GET', 'https://www.simplyhired.com/search?q=pharmacy+technician&l=American+Canyon%2C+CA&job=X5clbvspTaqzIHlgOPNXJARu8o4ejpaOtgTprLm2CpPuoeOFjioGdQ');
$job_posting_location = [];
$response->filter('.LeftPane article .SerpJob-jobCard.card .jobposting-subtitle span.JobPosting-labelWithIcon.jobposting-location span.jobposting-location')
->each(function ($node) use (&$job_posting_location) {
$job_posting_location[] = $node->text() . PHP_EOL;
});
$joblocation = 0;
$response->filter('.LeftPane article .SerpJob-jobCard.card .jobposting-title-container h3 a')
->each( function ($node) use ($job_posting_location, &$joblocation, $httpClient) {
$job_title = $node->text() . PHP_EOL; //job title
$job_posting_location = $job_posting_location[$joblocation]; //job posting location
// display the result
$items = "{$job_title} # {$job_posting_location}\n\n";
global $results;
$result = explode('#', $items);
$results['job_title'] = $result[0];
$results['job_posting_location'] = $result[1];
$joblocation++;
});
function convertToXML($results, &$xml_user_info){
foreach($results as $key => $value){
if(is_array($value)){
$subnode = $xml_user_info->addChild($key);
foreach ($value as $k=>$v) {
$xml_user_info->addChild("$k",htmlspecialchars("$v"));
}
}else{
$xml_user_info->addChild("$key",htmlspecialchars("$value"));
}
}
return $xml_user_info->asXML();
}
$xml_user_info = new SimpleXMLElement('<root/>');
$xml_content = convertToXML($results,$xml_user_info);
$xmlFile = 'details.xml';
$handle = fopen($xmlFile, 'w') or die('Unable to open the file: '.$xmlFile);
if(fwrite($handle, $xml_content)) {
echo 'Successfully written to an XML file.';
}
else{
echo 'Error in file generating';
}
what i got in xml file --
<?xml version="1.0"?>
<root><job_title>Pharmacy Technician
</job_title><job_posting_location> Vallejo, CA
</job_posting_location></root>
what i want in xml file --
<?xml version="1.0"?>
<root>
<job_title>Pharmacy Technician</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
<job_title>Pharmacy Technician 1</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
<job_title>Pharmacy Technician New</job_title>
<job_posting_location> Vallejo, CA</job_posting_location>
and so on...
</root>
You overwrite the values in the $results variable. You're would need to do something like this to append:
$results[] = [
'job_title' => $result[0];
'job_posting_location' => $result[1]
];
However here is no need to put the data into an array at all, just create the
XML directly with DOM.
Both your selectors share the same start. Iterate the card and then fetch
related data.
$httpClient = new \Goutte\Client();
$response = $httpClient->request('GET', $url);
$document = new DOMDocument();
// append document element node
$postings = $document->appendChild($document->createElement('jobs'));
// iterate job posting cards
$response->filter('.LeftPane article .SerpJob-jobCard.card')->each(
function($jobCard) use ($document, $postings) {
// fetch data
$location = $jobCard
->filter(
'.jobposting-subtitle span.JobPosting-labelWithIcon.jobposting-location span.jobposting-location'
)
->text();
$title = $jobCard->filter('.jobposting-title-container h3 a')->text();
// append 'job' node to group data in result
$job = $postings->appendChild($document->createElement('job'));
// append data nodes
$job->appendChild($document->createElement('job_title'))->textContent = $title;
$job->appendChild($document->createElement('job_posting_location'))->textContent = $location;
}
);
echo $document->saveXML();
I'm trying to scrape pricing data from a few home depot URLs. I'm using simple_html_dom.php which can be found here: https://simplehtmldom.sourceforge.io/
I'm having issues figuring out how to get the individual span class that I need in order to get the data I want.
Here's an image of the inspect element with the various fields I'm trying to access:
https://prnt.sc/qx5ujt
Here's the code I have so far which returns an empty array:
?<php
require 'simple_html_dom.php';
$dom = file_get_html('https://www.homedepot.com/p/Zinsco-20-Amp-1-1-2-in-2-Pole-Replacement-Thick-Circuit-Breaker-UBIZ220/100183119', false);
$answer = array();
if(empty($dom))
{
echo "EMPTY";
exit;
}
$divClass = "";
$dollars = "";
$cents = "";
$i = 0;
foreach($dom->find('price__wrapper') as $divClass)
{
foreach($divClass->find('span[class=price__dollars]') as $dollars) //dollars
{
$answer[$i]['dollars'] = $dollars->plaintext;
}
foreach($divClass->find('span[class=price__cents]') as $cents) //cents
{
$answer[$i]['cents'] = $cents->plaintext;
}
$i++;
}
print_r($answer);
exit;
?>
If the HTML really looks always like in the screenshot, than you can go on the content attribute of the parent span element.
This you can archive like below (untested example).
// Get the parent span element with ID ajaxPrice
$element = $dom->find('span#ajaxPrice',0);
// Check if attribute "content" exists
if(isset($element->content)) {
$price = $element->content; // This is the price as string value
$priceFloat = floatval($price); // This is the price as float value
// Splitting price in dollar and cent
$price = explode('.', $price);
$dollar = $price[0];
$cent = $price[1];
}
I'm building plugin wordpress for change attribute some links on content of post. I have function like this :
public static function encrypt_link_content( $content ){
if ( strpos( $content, 'safelinkThis' ) !== false ) {
// get dom from string of content
$content = str_get_dom($content);
// find link with attribute data-role=safelinkThis
$all_links = $content('a[data-role=safelinkThis]');
$count_links = count($all_links);
$i = 0;
$_links = self::get_links_post($count_links);
foreach ($all_links as $a) {
$href = $a->href;
$encrypt = self::zi_encrypt($href);
$a->href = $_links[$i]."?link=".$encrypt;
$i += 1;
}
$content = (string) $content;
}
return $content;
}
That function is work, Attribute href of link have been changed.
example of content :
Find us on :
Go to facebook
Go to google
Go to Twitter
Will be change to like this :
Find us on :
Go to facebook
Go to google
Go to Twitter
I'm use http://code.google.com/p/ganon/ library to find dom in string.
Now
I have a custom table for store domain exception, The name is wp_sf_domain. I won't change a link if it have href attribute with domain exception from wp_sf_domain.
I know, I can use this
global $wpdb;
$sf_table_name = $wpdb->prefix . 'sf_domain';
foreach ($all_links as $a) {
$href = $a->href;
$aDomain = parse_url($href)['host'];
$record = $wpdb->get_results( "SELECT * FROM $sf_table_name WHERE domain='$aDomain'" );
if(count($record) == 0){
// change this link
}
}
But I think this is a bad practice if I have many links on content, I should check one by one to database.
How can I compare between host of link on content with list domain in database and change a link attribute href if host of link not include in database?
I'm trying to create an XML feed, essentially of a bunch of job listings. I have 39 job listings in the database right now, and I'm creating the XML using SimpleXML and it's working just fine, except it's only outputting the very last record from the database in the xml. I'm sure there's an easy solution.
Looking at the code, I want each job to be inside the <job> element, and I want a new <job> element to be created for each job. All of these are enclosed inside one <source> element. Here is my PHP code, and below that is the result I'm getting - you'll see there's only one row returning instead of all 39.
<?php
Header('Content-type: text/xml');
class SimpleXMLExtended extends SimpleXMLElement {
public function addCData($cdata_text) {
$node = dom_import_simplexml($this);
$no = $node->ownerDocument;
$node->appendChild($no->createCDATASection($cdata_text));
}
}
$jobs = $dbjobs->find(array('job_title' => array('$exists' => true), 'job_title' => array('$nin'=> array('',' ', null))));
$jobs = iterator_to_array($jobs);
$xml = new SimpleXMLExtended('<source/>');
$i = 0;
foreach ($jobs as $job) {
$i++;
$xml->job = NULL;
$j = $xml->job;
$j->referencenumber = NULL;
$j->referencenumber->addCData($job['id']);
$j->title = NULL;
$j->title->addCData($job['job_title']);
$j->url = NULL;
$j->url->addCData('http://www.site.com/joblisting.php?jl=' . $job['id']);
$j->description = NULL;
$j->description->addCData($job['job_description']);
$j->company = NULL;
$j->company->addCData($job['company']);
$j->city = NULL;
$j->city->addCData($job['city']);
$j->state = NULL;
$j->state->addCData($job['state']);
$j->postalcode = NULL;
$j->postalcode->addCData('');
$j->country = NULL;
$j->country->addCData('US');
$j->date = NULL;
$j->date->addCData(date("Y-m-d", $job['added']->sec));
$j->site = NULL;
$j->site->addCData('site.com');
$j->count = NULL;
$j->count->addCData($i);
}
print($xml->asXML());
?>
And here is an example response I get:
<source>
<job>
<referencenumber>230257</referencenumber>
<title>Home Phone Representative</title>
<url>http://www.site.com/joblisting.php?jl=230257</url>
<description></description>
<company>Media LLC</company>
<city>San Jose</city>
<state>CA</state>
<postalcode></postalcode>
<country>US</country>
<date>2013-09-16</date>
<site>site.com</site>
<count>39</count>
</job>
</source>
As you can see it populates just fine but I need all the listings instead of just the last one in the loop. Thanks for your help in advance.
$xml->job = NULL;
is your wong line because you are cancelling last record.
and all $j->xxxxxx = NULL; are useless.
then code as
foreach ($jobs as $job) {
$i++;
$j = $xml->addChild('job');
$j->referencenumber->addCData($job['id']);
$j->title->addCData($job['job_title']);
(...)
is better.
to know more about check the SimpleXMLElement::addChild doc.
You should add Childs to the root:
foreach ($jobs as $job) {
$i++;
$j=$xml->addChild('job')
...
I have a problem when looping through data and creating XML files using DOMDocument. It all worked fine until I decided to run this script in batches. Now I have multiple '<?xml version="1.0"?>' starting tags in my XML files, looks like one for each batch. There are also more products nodes being generated than there are products. Can anyone help.
//get products
$productsObj = new Products($db,$shopKeeperID);
//find out how many products
$countProds = $productsObj->countProducts();
$productBatchLimit = 3; //keep low for testing
//create new file
$file = 'products/'. $products . '.xml';
$fh = fopen($file, 'a');
//create XML document object model (DOM)
$doc = new DOMDocument();
$doc->formatOutput = true;
$counter = 1;
$r = $doc->createElement( "products" );
$doc->appendChild( $r );
for ($i = 0; $i < $countProds; $i += $productBatchLimit) {
$limit = $productBatchLimit*$counter;
$products = $productsObj->getShopKeeperProducts($i, $limit);
$prod = '';
//loop through each product to create well formed XML
foreach( $products as $product ){
$prod = $doc->createElement( "offer" );
$offerID = $doc->createElement( "offerID" );
$offerID->appendChild($doc->createTextNode( $product['prod_id'] ));
$prod->appendChild( $offerID );
$productName = $doc->createElement( "name" );
$productName->appendChild($doc->createTextNode( $product['productName'] ));
$prod->appendChild( $productName );
$r->appendChild( $prod );
$strxml = $doc->saveXML();
}
fwrite($fh, $strxml);
$counter++;
}
fclose($fh);
I am using this command that erases all strings like <?something?>.
$text = preg_replace( '/<\?[^\?>]*\?>/', ' ', $text);
First erase them all and then put one at the beginning.
I did just this a while ago, I can't see exactly whats going wrong. But I can provide the function I built for a site for you to look at.
This function worked 100% fine and as expected. It created a perfect XML document and formatted it perfectly. I hope this helps you find your problem.
function create_xml_file()
{
/* create a dom document with encoding utf8 */
$domtree = new DOMDocument('1.0', 'utf-8');
/* create the root element of the xml tree */
/* Data Node */
$xmlRoot = $domtree->createElement("data");
/* append it to the document created */
$xmlRoot = $domtree->appendChild($xmlRoot);
/* Set our Prices in our <data> <config> node */
$config_node = $domtree->createElement("config");
$config_node = $xmlRoot->appendChild($config_node);
// Add - node to config
$config_node->appendChild($domtree->createElement('config_node', '123456'));
$config_node->appendChild($domtree->createElement('some_other_data', '123456'));
/* Create prices Node */
$price_node = $domtree->createElement('price');
$price_node = $config_node->appendChild($price_node);
/* Black Price Node */
$black_node = $price_node->appendChild($domtree->createElement('black'));
foreach ($p->List_all() as $item):
if ($item['color'] == 'black'):
$black_node->appendChild($domtree->createElement($item['type'], $item['price']));
endif;
endforeach;
/* Create galleries Node */
$galleries_node = $domtree->createElement("galleries");
$galleries_node = $xmlRoot->appendChild($galleries_node);
foreach ($i->List_all() as $image):
/* Our Individual Gallery Node */
$gallery_node = $domtree->createElement("gallery");
$gallery_node = $galleries_node->appendChild($gallery_node);
$gallery_node->appendChild($domtree->createElement('name', $image['name']));
$gallery_node->appendChild($domtree->createElement('filepath', $image['filepath']));
$gallery_node->appendChild($domtree->createElement('thumb', $image['thumb']));
endforeach;
/* Format it so it is human readable */
$domtree->preserveWhiteSpace = false;
$domtree->formatOutput = true;
/* get the xml printed */
//echo $domtree->saveXML();
$file = 'xml/config.xml';
$domtree->save($file);
}
I hope this helps you find your answer. I commented it well for easy understanding.
I simply changed
$strxml = $doc->saveXML();
and
fwrite($fh, $strxml);
To
$doc->save($file);
and it works perfectly.