Good day all.
I have a page that calls a script via AJAX, the script calls a prestashop webservice and has to insert several items at a time. My problem is that the script seems "freezed" for most of the time, then after 2 or 3 minutes, starts to print out results, and continue since the end. what i would like to do is to retrieve something from the script each time it insert an item, and not to "buffer" hundreds of results and then see all of them in one time.
this is the code (stripped of unecessary parts) that I'm using.
<?php
function PS_new_product(all product attributes) {
global $webService;
try {
$opt = array('resource' => 'products');
$opt['postXml'] = $xml -> asXML();
$xml = $webService -> add($opt); //this should return each product state (if it's inserted or not)
return true;
} catch (PrestaShopWebserviceException $ex) {
return false;
}
}
function inserisciProdottiRAW(){
set_time_limit(30);
$sql_prodotti = "SELECT everything i need to import";
if ($prodotti = mysql_query($sql_prodotti)) {
while ($row = mysql_fetch_assoc($prodotti)){
$webService = new PrestaShopWebservice(PS_SHOP_PATH, PS_WS_AUTH_KEY, DEBUG);
$opt = array('resource' => 'products');
$opt['filter[reference]'] ="[".$row["modello"]."]";
$xml = $webService->get($opt);
$prodotto = $xml->children()->children();
if ($prodotto->product[#id] == ""){
PS_new_product(/*all product attributes*/)
}
}
}
echo "ok";
}
inserisciProdottiRAW();
?>
I would like something that i could catch in the page I have called it to know for example at which items it arrived at a certain time... it could be possible? or I have to implement something that count the items inserted in the database every... mh... 30 seconds?
If you need a quick and dirty solution - just include an echo after every insertion and make sure that it includes new line and is big enough to flush cache in php/apache (4KB should do it). Use this method for example:
function logProgress($message)
{
echo($messsage);
for ($i=0; $i<4096; $i++)
{
echo(" ");
}
echo("\n");
}
If you use gzip then it could not be enough and use other random white space characters as well.
If you want to show the progress to some user then you can run your insertion script in background, save state in some database table and poll it from different script.
Running a background job can be done using fork function, curl or if you need good job manager, try gearman.
Also be warned that if you use sessions you cannot have 2 scripts running in the same time - one will be waiting for the other one to finish. If you know that you won't be using session anymore in your script, you can call session_close() get rid of this locking issue.
Related
I'm connecting to the trakt.tv api, I want to create a little app for myself that displays movies posters with ratings etc.
This is what I'm currently using to retrieve their .json file containing all the info I need.
$json = file_get_contents('http://api.trakt.tv/movies/trending.json/2998fbac88fd207cc762b1cfad8e34e6');
$movies = json_decode($json, true);
$movies = array_slice($movies, 0, 20);
foreach($movies as $movie) {
echo $movie['images']['fanart'];
}
Because the .json file is huge it is loading pretty slow. I only need a couple of attributes from the file, like title,rating and the poster link. Besides that I only need the first 20 or so. How can I make sure to load only a part of the .json file to load it faster?
Besides that I'm not experienced with php in combination with .json so if my code is garbage and you have suggestions I would love to hear them.
Unless the API provides a limit parameter or similar, I don't think you can limit the query at your side. On a quick look it doesn't seem to provide this. It also doesn't look like it really returns that much data (under 100KB), so I guess it is just slow.
Given the slow API I'd cache the data you receive and only update it once per hour or so. You could save it to a file on your server using file_put_contents and record the time it was saved too. When you need to use the data, if the saved data is over an hour old, refresh it.
This quick sketch of an idea works:
function get_trending_movies() {
if(! file_exists('trending-cache.php')) {
return cache_trending_movies();
}
include('trending-cache.php');
if(time() - $movies['retreived-timestamp'] > 60 * 60) { // 60*60 = 1 hour
return cache_trending_movies();
} else {
unset($movies['retreived-timestamp']);
return $movies;
}
}
function cache_trending_movies() {
$json = file_get_contents('http://api.trakt.tv/movies/trending.json/2998fbac88fd207cc762b1cfad8e34e6');
$movies = json_decode($json, true);
$movies = array_slice($movies, 0, 20);
$movies_with_date = $movies;
$movies_with_date['retreived-timestamp'] = time();
file_put_contents('trending-cache.php', '<?php $movies = ' . var_export($movies_with_date, true) . ';');
return $movies;
}
print_r(get_trending_movies());
I have a function in my theme's function.php file that calls to the Edmunds API and retreives a stock vehicle image.
From a page template, if I call the function more than once, it fails on the second call. It works perfectly the first time, but doesn't output anything the second time. When I try and print_r the $aVehImage array, it's empty. (I've verified that images are available in the API for the vehicles in the secondary calls, btw)
Code below:
function get_edmunds_image($vehicleMake, $vehicleModel, $vehicleYear) {
$getVehicleStyle = 'https://api.edmunds.com/api/vehicle/v2/'.$vehicleMake.'/'.$vehicleModel.'/'.$vehicleYear.'/styles?state=used&fmt=json&api_key=XXX';
$vehicleStyleID = json_decode(file_get_contents($getVehicleStyle), true);
$getImages = 'https://api.edmunds.com/v1/api/vehiclephoto/service/findphotosbystyleid?styleId='.$vehicleStyleID['styles'][0]['id'].'&fmt=json&api_key=XXX';
$aImages = json_decode(file_get_contents($getImages), true);
$aVehImage = array();
foreach ($aImages as $image) {
$iURL = 'http://media.ed.edmunds-media.com'.str_replace('dam/photo','',$image['id']).'_';
array_push($aVehImage, $iURL);
}
echo '<img src="'.$aVehImage[0].'500.jpg" />';
}
Thanks Marcos! That did, indeed, appear to be the issue. For now, I just used the sleep() function to pause it for a second, until I find a better solution.
I am using PHP to get the contents of an API. The problem is, sometimes that API just sends back a 502 Bad Gateway error and the PHP code can’t parse the JSON and set the variables correctly. Is there some way I can keep trying until it works?
This is not an easy question because PHP is a synchronous language by default.
You could do this:
$a = false;
$i = 0;
while($a == false && $i < 10)
{
$a = file_get_contents($path);
$i++;
usleep(10);
}
$result = json_decode($a);
Adding usleep(10) allows your server not to get on his knees each time the API will be unavailable. And your function will give up after 10 attempts, which prevents it to freeze completely in case of long unavailability.
Since you didn't provide any code it's kind of hard to help you. But here is one way to do it.
$data = null;
while(!$data) {
$json = file_get_contents($url);
$data = json_decode($json); // Will return false if not valid JSON
}
// While loop won't stop until JSON was valid and $data contains an object
var_dump($data);
I suggest you throw some sort of increment variable in there to stop attempting after X scripts.
Based on your comment, here is what I would do:
You have a PHP script that makes the API call and, if successful, records the price and when that price was acquired
You put that script in a cronjob/scheduled task that runs every 10 minutes.
Your PHP view pulls the most recent price from the database and uses that for whatever display/calculations it needs. If pertinent, also show the date/time that price was captured
The other answers suggest doing a loop. A combo approach probably works best here: in your script, put in a few loops just in case the interface is down for a short blip. If it's not up after say a minute, use the old value until your next try.
A loop can solve this problem, but so can a recursive function like this one:
function file_get_contents_retry($url, $attemptsRemaining=3) {
$content = file_get_contents($url);
$attemptsRemaining--;
if( empty($content) && $attemptsRemaining > 0 ) {
return file_get_contents_retry($url, $attemptsRemaining);
}
return $content;
}
// Usage:
$retryAttempts = 6; // Default is 3.
echo file_get_contents_retry("http://google.com", $retryAttempts);
I have a heavy processing script which can be started from a user via frontend in our intranet. Imagine something like this:
$html = file_get_contents($url);
$pattern = '/[A-Z0-9._%+-]+(#|\(at\)|\[at\])[A-Z0-9.-]+\.[A-Z]{2,4}\b/i'; //also (at) and [at]
preg_match_all($pattern,$html,$emails);
foreach ($emails[0] as $m)
{
$m[] = $m;
}
foreach($m as $n){echo $n."<br>";}
This is just an example for illustration of the question! Don't judge it on common sense.
NOW what i want is 2 things:
Stop the process on Click on a button from the user. This means: outputting the already collected array $m[].
Stop the process (or better: stop the array-collecting process and jump to the echo of the already collected array) based on time (for example max. 1 minute collecting the array, THAN jumping to echoing.
I don't want to echo live and setting max execution time will stop the script without echoing.
Thanks for your wise advise on both subquestions
You have to run script through ajax, and button click regsiter session by ajax
foreach ($emails[0] as $m)
{
if (!isset($_SESSION['STOP'])) {
$m[] = $m;
}
else { //stop code here }
}
I have write a script for webscraping where i am fetching each link from the page and getting load that url in the code and this working extremely slow this is taking about 50 sec for first output and taking an age to complete about 100 links, I am not getting why this is working so slow, I am thinking about caching but don't know how this could help us.
1) Page caching OR Opcode cache.
code is :
public function searchForum(){
global $wpdb;
$sUrl = $this->getSearchUrl();
$this->logToCrawler();
$cid = $this->getCrawlId();
$html = file_get_dom($sUrl);
$c=1;
foreach($html('div.gridBlobTitle a:first-child') as $element){
$post_page = file_get_dom($element->href);
$post_meta = array();
foreach($post_page('table#mytable img:first-child') as $img){
if(isset($img->src)){
$post_meta['image_main'] = self::$forumurl.$img->src;
}
else{
$post_meta['image_main']=NULL;
}
}
foreach($post_page('table.preferences td:odd') as $elm){
$post_meta[] = strip_tags($elm->getInnerText());
unset($elm);
}
/*Check if can call getPlainText for description fetch*/
$object = $post_page('td.collection',2);
$methodVariable = array($object, 'getPlainText');
if(is_callable($methodVariable, true, $callable_name)){
$post_meta['description'] = utf8_encode($object->getPlainText());
}
else{
$post_meta['description'] = NULL;
}
$methodVariable = array($object, 'getInnerText');
if(is_callable($methodVariable, true, $callable_name)){
/*Get all the images we found*/
$rough_html = $object->getInnerText();
preg_match_all("/<img .*?(?=src)src=\"([^\"]+)\"/si", $rough_html, $matches);
$images = array_map('self::addUrlToItems',$matches[1]);
$images = json_encode($images);
}
if($post_meta[8]=='WTB: Want To Buy'){
$status='buy';
}
else{
$status='sell';
}
$lastdate = strtotime(date('Y-m-d',strtotime("-1 month")));
$listdate = strtotime(date('Y-m-d',strtotime($post_meta[9])));
/*Check for date*/
if($listdate>=$lastdate){
$wpdb->query("INSERT
INTO tbl_scrubed_data SET
keywords='".esc_sql($this->getForumSettings()->search_meta)."',
url_to_post='".esc_sql($element->href)."',
description='".esc_sql($post_meta['description'])."',
date_captured=now(),crawl_id='".$cid."',
image_main='".esc_sql($post_meta['image_main'])."',
images='".esc_sql($images)."',brand='".esc_sql($post_meta[0])."',
series='".esc_sql($post_meta[1])."',model='".esc_sql($post_meta[2])."',
watch_condition='".esc_sql($post_meta[3])."',box='".esc_sql($post_meta[4])."',
papers='".esc_sql($post_meta[5])."',year='".esc_sql($post_meta[6])."',case_size='".esc_sql($post_meta[7])."',status='".esc_sql($post_meta[8])."',listed='".esc_sql($post_meta[9])."',
asking_price='".esc_sql($post_meta[10])."',retail_price='".esc_sql($post_meta[11])."',payment_info='".esc_sql($post_meta[12])."',forum_id='".$this->getForumSettings()->ID."'");
unset($element,$post_page,$images);
} /*END: Check for date*/
}
$c++;
}
Note :
1) I am using [Ganon DOM Parser][1] for parsing the HTML.
[1]: https://code.google.com/p/ganon/wiki/AccesElements
2) On windows XP with WAMP, Mysql 5.5 PHP 5.3, 1 GB of RAM.
If you need more info please comment them.
Thanks
You need to figure out what parts of your program are being slow. There are two ways to do that.
1) Put in some print statements that print out the time in various places, so you can say "Hey, look, this took 5 seconds to go from here to here."
2) Use a profiler like xdebug that will run your program and analyze it while it's running and then you can know which parts of the code are slow.
Just looking at a program you can't say "Oh, that's the slow part to speed up." Without knowing what's slow, you'll probably waste time speeding up parts that aren't the slow parts.