Listing items from a e-commerce site - php

I'm working on a script in php, that takes all the products URLs from an e-commerce site, for now I'm just using the function get_file_contents() and after I search for the keyword with preg_match_all() that anticipate the item url,now, my question is, can I use a more direct and efficent way to store all this link from a website and put they on my database?

I've recently created a crawler system for my client's project. Basically these are the steps i followed:
Project was based on PHP and it supports multiple type of document such a xml, json and html.
I've created a base product object with the properties i needed (title, image, price, link, category, source site)
For each web site, i'm using a parser which generally uses PhpDom library with DomXpath.
Basically, i find the product listing tag, and loop through the records and create a new product list object with products object inside it (step 2).
When parsing the web site finishes, i'm sending the whole list to my base action it checks if the product with that url is already exists if not it adds to db.
Also in my server, i'm running a cron where it checks all the product links and if the response returns 404 or 500 it adds a flag to product with 1. I'm also running an other cron which checks the links again with the flag 1. If it's still responding with error code, it removes the content from my database.
This is a sample parser code that i use. I hope it will help you through the process:
$content = file_get_contents($url);
libxml_use_internal_errors(true);
$oDom = new DomDocument;
$oDom->validateOnParse = false;
$res = $oDom->loadHTML($content);
libxml_clear_errors();
$oDom->preserveWhiteSpace = false;
$oXpath = new DOMXPath($oDom);
$productNode = $oXpath->query('//div[#class="ulist span4"]');
if($productNode){
$productsList = array();
foreach($productNode as $p){
$this->oProduct = new Products();
$productURL = $oXpath->query('div[#class="ures"]/div[#class="ures-cell"]/a', $p)->item(0);
$this->oProduct->url = $this->base.'/'.$productURL->getAttribute('href');
$this->oProduct->category = $categoryID;
$this->oProduct->productPeek = $peek;
$titleNode = $oXpath->query('div[#class="ubilgialan span12"]/div[#class="span12 uadi"]/a/span', $p)->item(0);
$this->oProduct->title = trim($titleNode->nodeValue);
$priceNode = $oXpath->query('div[#class="ubilgialan span12"]/div[#class="span8 ufytalan"]/div[#class="ufyt"]/span[#class="divdiscountprice"]/span', $p)->item(0);
$this->oProduct->price = trim($priceNode->nodeValue);
$imageNode = $oXpath->query('div[#class="ures"]/div[#class="ures-cell"]/a/img', $p)->item(0);
$this->oProduct->image = $this->base."/".$imageNode->getAttribute('src');
$productsList[] = $this->oProduct;
}
if(count($productsList) > 0){
return $productsList;
}
}

Related

Import XML products to Prestashop in PHP

I need create a PHP file to import a lot of products from a external source (a distributor) to my Prestashop 1.7.6.
I need to connect with this service "http://www.ferrunion.com/ita/codice/id_service.php" to take the token and when I recive this string I need to connect with this service "http://www.ferrunion.com/ita/codice/catalogo_service.php" to recive the XML file.
This is an example of the structure of the XML:
<![CDATA[
<product>
<id>id<id>
<description>description</description>
<quantity>quantity</quantity>
<confezione>confezione</confezione>
<prezzo_lordo>prezzo acquisto senza sconti</prezzo_acquisto>
<price>price</price>
<info>info</info>
</product>
]]>
The problema are 2:
How can I conncet whit this services in PHP language?
When I have the file, how can import the XML code in my Prestashop's database?
Can you help me with this problem?
Thanks you.
I guess your idea is to develop a Prestashop module and run it using a cron, is that so?
Have you tried to bring the XML file using curl?
From your Prestashop module you must bring the XML via CURL. Once you have the XML loaded you just have to parse (take a look:https://www.w3schools.com/php/php_xml_dom.asp) the file and process each product with the Prestashop API.
Tell me if I can help you.
First create a module
Use module generator to speed up the process
Generate starting module online at https://validator.prestashop.com/generator
1 Download th file
Use curl to download the file to your serwer
See this: https://stackoverflow.com/questions/19248371/how-can-i-save-a-xml-file-got-via-curl
2 Proces the file and insert/update products: 2.1) Through SQL directly to DB2.2) Through PrestaShop API or2.3) Through internal Product Class object (reccomended)
2.1 Create PHP script to load xml and run some customs SQL insert directly (you will need to handle images upload and insert all images to db which is not that trivial). So you would need to populate some or all of this tables:
ps_product
ps_product_lang
ps_product_shop
ps_stock_available
ps_category_product
ps_image
ps_product_download
...
2.2 Use PrestaShop API which will take a resource through URL endpoint and insert it to database internally. This way PrestaShop system will take care of a lot of thing for you, and you will not have to update your script with every new PrestaShop release if they decide to change something in database (or take care of uploading your images and database relations)
2.3 Use Prestashop internal Product class to insert new product wo database.
<?php
// add category first then make reference to category
// configure and add product
$product = new Product;
$product->name = $productName;
$product->ean13 = '';
$product->reference = '';
$product->id_category_default = $getCategoryID;
$product->category = $getCategoryID;
$product->indexed = 1;
$product->description = $description;
$product->condition = 'new';
$product->redirect_type = '404';
$product->visibility = 'both';
$product->id_supplier = 1;
$product->link_rewrite = $link_rewrite;
$product->quantity = $singleStock;
$product->price = round($price - (18.69 / 100) * $price, 2);
$product->active = 1;
$product->add();
Prestahop import script example here
Investigate Product Class here
Working with large XML files
When running php on large XML file do not load all the file to memory with simplexml_load_file() function. Instead use XMLReader combined with SimpleXMLElement.
<?php
// this is not PrestaShop related script. This is pure PHP for manipulating large XML files.
// first load the file with curl and save it on your server in desired location. Then load the file as in example below:
$continueFrom = getLastNumWhereItStoped();
$iCount = 0;
$limit = 1000;
$xml = new XMLReader();
/*
* One-liners to gzip and ungzip a file:
* copy('file.txt', 'compress.zlib://' . 'file.txt.gz');
* copy('compress.zlib://' . 'file.txt.gz', 'file.txt');
*/
$xml->open('compress.zlib://'.'filename.xml.gz');
while($xml->read() && $xml->name != 'product')
{
// skip all not important nodes and stop on "product" node
}
/**
* Run on every "product" node untill it hits 1000
*/
while($xml->name == 'product' && $limit + $continueFrom >= $iCount)
{
if($iCount <= $continueFrom ) continue;
$element = new SimpleXMLElement($xml->readOuterXML());
$product = array(
'name' => strval($element->text->name),
'price' => strval($element->price->buynow),
'parent_category' => strval($element->category->attributes()->parent_category) // category have to be created before product import [maping category is as easy as you might think]
);
// ... do something with $product set create Product Class instance or... send it to API or make SQL insert directly
// if product exists just update the product value you want (for example price and stock quantity).
$iCount++;
$xml->next('product');
unset($element);
}
/* If success */
$continueFrom = setLastNumWhereItStoped($continueFrom + $limit);
3 Schedule the task
Set CRON job tu run the script automatically (download XML) then run update on fields you really need to update. Read your host provider docs on how to do that
DOCS PrestaShop API endpoint
Generate access token
Enable the webservice By default, the webservice feature is disabled on PrestaShop and needs to be switched on before the first use. You can enable it using GUI or programmatically. Both method are presented here: https://devdocs.prestashop.com/1.7/webservice/tutorials/creating-access/
Create a resource
vTo create a resource, you simply need to GET the XML blank data for the resource (example /api/someendpoint?schema=blank), fill it with your changes, and send POST HTTP request with the whole XML as body content to the /api/someendpoint/ URL.
PrestaShop will take care of adding everything in the database, and will return an XML file indicating that the operation has been successful, along with the ID of the newly created customer.
Update a resource
To edit an existing resource: GET the full XML file for the resource you want to change (example /api/someendpoint/1), edit its content as needed, then send a PUT HTTP request with the whole XML file as a body content to the same URL again.
Usefull resources:
https://devdocs.prestashop.com/1.7/webservice/
https://devdocs.prestashop.com/1.7/webservice/getting-started
https://devdocs.prestashop.com/1.7/modules/creation/external-services/
https://drib.tech/programming/parse-large-xml-files-php
Use DB:
https://devdocs.prestashop.com/1.7/development/database/db/
https://devdocs.prestashop.com/1.7/development/database/structure/
How are images stored - path generation explained
Insert image to prestashop database
The topic is quite broad but hope that this will get you started.

PHP check if XML node exists before saving to variable?

Here is a snippet of the xml I am working with:
My example xml
A client requested that we add the ability to filter which type of "news articles" are displayed on specific pages. They create these articles on another website, where they now have the ability to assign a one or more categories to each of the articles. We load the articles via php and xml.
The error I receive is:
Call to a member function getElementsByTagName() on null in ...
Here is the code from 2012 that I am working with:
$item = $dom_object->getElementsByTagName("Releases");
foreach( $item as $value )
{
$Release = $value->getElementsByTagName("Release");
foreach($Release as $ind_Release){
$Title = $ind_Release->getElementsByTagName("Title");
$PublishDateUtc = $ind_Release->getAttribute('PublishDateUtc');
$DetailUrl = $ind_Release->getAttribute('DetailUrl');
$parts = explode('/', $DetailUrl);
$last = end($parts);
I am trying to transverse to the category code and set a variable with:
$newsCategory = $ind_Release->getElementsByTagName("Categories")->item(0)->getElementsByTagName("Category")->item(0)->getElementsByTagName("Code")->item(0)->nodeValue;
This loads the current 2018 articles with the category slug being echoed, because they have an assigned category, but it fails to load 2017, 2016, and so on, I believe, because they are not assigned a category within the XML and this is breaking something.
A news article without a category appears with an empty categories node within XML
I understand that I am using getElementsByTagName, and because there is no element beyond the first categories node it breaks.
Is there a way to check that there is indeed a path to Categories->Category->Code[CDATA] before trying to set it as a variable and breaking it?
I apologize if this is confusing, I am not a PHP expert and could use all the help I can get. Is there a better way to transverse to the needed node?
Thanks.
You need to use XPath. If you're using DOMDocument, this is done via DOMXpath.
Your current approach uses chaining, and the problem with chaining is it breaks down if a particular juncture of it doesn't return what the following method relies on. Hence your error.
Instead, check the whole path from the start:
$domxp = new DOMXpath($dom_object);
$node = $domxp->query('/Categories[1]/Category[1]/Code[1]');
if (count($node)) {
//found - do something
}

Scrape shipping cost which is based on country - dynamiclly build

i have been trying to grab the product price and shipping cost
on aliexpress.com
The price is set and fixed and thus - easy...
However, the shipping cost is loaded after the site determines
which country you are from.
I viewed the source and it has a hidden input field which is populated (probably) after checking my location or ip.
How can i use CURL to "fool" the site and get the shipping cost to my country - aka scraping it using PHP?
The CURL i got:
$html = curl_download($producturl, $browserAgent);
$dom = new DOMDocument();
$dom->validateOnParse = true;
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
libxml_clear_errors();
// get and clean product price
$price = $dom->getElementById('product-price');
$price = $price->nodeValue;
$clnprice = currency_string_remover($price);
$clnprice = explode(' ', $clnprice);
$clnprice = array_filter(array_map('trim',$clnprice),'strlen');
$clnprice = array_values($clnprice)[0];
$currency = currency_string_extractor($price);
// get and clean shipping price
// >> this is empty until page determines location! PROBLEM
$shipprice = $dom->getElementById('shipping-cost');
$shipprice = $shipprice->nodeValue;
echo '<pre>SPRICE';
print_r($shipprice);
echo '</pre>';
$shipprice = explode('-', $shipprice);
$shipprice = $shipprice[0];
$shipprice = currency_string_remover($shipprice);
echo '<div id="sitename">aliexpress</div>';
echo '<div id="price">'.$clnprice.'</div>';
echo '<div id="shipprice">'.$shipprice.'</div>';
echo '<div id="currency">'.$currency.'</div>';
Does anyone have any ideas? Pointers? Helping links?
I've checked the site. It works for several languages and countries. With Russia (my case) at a product page a main price includes a shipping cost. So this dom html item remains empty: <span id="shipping-cost"></span>
By the way, it's not in a form (in my case).
If you suspect it's ajax (javascript) populated you better check all js files for the shipping-cost key word. I've done it with Chrome dev-tools and in my case I found no appearing of it in any js-files (incl. the source html file). So most likely it's not javascript (ajax) updated but rather this field is originally on server generated and might be served empty.
Your browser watches the site from certain country and the server where you run php code (curl scraper) does it from totally different country (IP). Thus the Aliexpress will respond with different page content. So I recommend you a free proxy service hola.org to change/rotate country (IP) thru proxying for debugging. Thus you might check this site with different country based IPs to see this field.
You might need to check other field (product-info-shipping) to see shipping conditions. http://joxi.ru/xAe8Wy1hGDgq2y
If you really want to request the web page with the shipping-cost populated as in certain country (IP) then you need to use a proxy service to proxy your curl requests.

How to get all product prices from a website with curl

Im trying to use cURL to get all the product prices from this site but i dont really know how to scrape all prices for every product on this site http://www.bikestore.ie/.
can someone please give me som tips?
Right now im just testing to get one price for a product and that is no problem, butt can i get the precis for all products??
my code right now is:
public function Scrape(){
$curl = curl_init('http://www.bikestore.ie/scott-speedster-30-bike-2015.html');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$page = curl_exec($curl);
if(!empty($curl)){
$doc = new DOMDocument;
$doc->loadHTML($page);
$xpath = new DOMXPath($doc);
$rupees = $xpath->evaluate('string(//div[#class="product-shop"]//div[#class="price-box"]//span[#class="price"])');
echo $rupees;
}
else {
print "Not found";
}
It's not an easy task.
The site is structutred. But each product is defined in url.
ex.: http://www.bikestore.ie/scott-speedster-30-bike-2015.html
when you add it to cart the unique product identifier is seen:
Steps
Crawl the whole site with cURl (find all the links <a> of products). See the post on simple python crawler, you just make similar with php.
Store them in DB (ex. MySQL)
For each link you run your Scrape() procedure fetching price/product id. Getting price of a product you mark its link as 'checked' in DB so that you do not run it once again.
Notes: For the sake of parallel processing you might run processes of point 1&2 and of point 3 in parallel. Use cron for this.

Adding a node programmatically

I am very new to Drupal. I need to develop a site using this CMS. I can understand creating content as an admin. But I would like to create content from code. For example I want to create articles in the backend programmatically without publishing them. So that site admin can review and publish them if he wants to. Tasks like these.Are there any references for programmers? About the structure of drupal code and where to write what things like that. Not videos I can't watch them in office.
Custom code in Drupal is generally done with the help of modules.
One way to familiarize yourself with the Drupal API could be to install the Examples.
In your Google searches, go for tutorials about writing your own modules.
THis being said, saving a node programmatically is fairly straightforward & I doubt you will have problems finding out how to do it.
Your main problem is to understand the "Drupal Way".
You can check excellent resources such as
- buildamodule.com
- drupalize.me
SOLUTION !
You can use below code to create a node programatically with drupal,
$node = new stdClass(); // Create a new node object
$node->type = 'YOUR_CONTENT_TYPE';
node_object_prepare($node); // Set some default values
$node->language = LANGUAGE_NONE;
$node->status = 0; // un-published
$node->uid = 'USER_ID';
$node->title = 'YOUR_TITLE';
$node->body['und'][0]['value'] = 'YOUR_DESCRIPTION';
$node->body['und'][0]['summary'] = 'YOUR_SHORT_DESCRIPTION';
$node->body['und'][0]['format'] = 'filtered_html';
$node = node_submit($node); //prepare node for saving
node_save($node); // save node

Categories