PHP Scraper Loops Until Time Limit Exceeded (Using Simple HTML Dom Parser) - php

I am attempting to scrape the price of the same product from two different websites. While it pulls the correct results and prints out what I want, when the I run the script I get this error after the results are correctly printed:
Fatal error: Maximum execution time of 120 seconds exceeded in [...] on line 144
And this is my code:
<?php
//Adds in the simple HTML DOM parser
include ('simple_html_dom.php');
//Defines the Target URL to Scrape
$cbdUrl = "https://cbdstore.co.za/product/africanpure-everyday-cbd-1000mg-30ml/";
$apUrl = "https://africanpure.co/product/everyday-cbd-oil-1000mg-30ml/";
//Defines 'html' as the scraped content from the URL above
$cbdHtml = file_get_html($cbdUrl);
$apHtml = file_get_html($apUrl);
//Creating an array to store all the 'price' classes text from the page
$cbdPrices = array();
//Fetching all the '.amount' and storying them in the array as plain text.
foreach($cbdHtml->find('div.summary.entry-summary p.price') as $cbdElement)
{
foreach($cbdElement->find('.amount') as $cbdAmt)
{
$cbdPrices [] = $cbdAmt->plaintext;
}
}
//Repeating for AfricanPure
$apPrices = array();
foreach($apHtml->find('div.summary-inner div.basel-scroll-content p.price') as $apElement)
{
foreach($apElement->find('.amount') as $apAmt)
{
$apPrices [] = $apAmt->plaintext;
}
}
// Writes out CBD Store Price
echo 'CBD Store has the Everday CBD Oil for: ' . $cbdPrices[0];
// Writes out AP Price
echo 'African Pure has the Everday CBD Oil for: ' . $apPrices[0];
?>

It looks like you are only interested in one price for each ($cbdPrices[0]) and not an array of prices, so try breaking out of the loops after getting the first price.
foreach($cbdHtml->find('div.summary.entry-summary p.price') as $cbdElement)
{
foreach($cbdElement->find('.amount') as $cbdAmt)
{
$cbdPrices [] = $cbdAmt->plaintext;
break;
}
}
And do the same on the other one. You could also probably not make the variable an array in the first place?

Related

How to count JSON response and output how many lines there are

Hello fellow developers,
I have been trying to manipulate the output and display the total amount of workers there are instead of outputting the workers name as a string.
Bellow you will find the data that i am receiving and further down i will explain how i would like to handle the JSON response.
{
"result":
{
"addr":"ADDRESS_HERE",
"workers":
[
["worker1080",{},2,1,"200000",0,22],
["worker1080",{"a":"899.4"},3,1,"512",0,24]
],
"algo":-1
},
"method":"stats.provider.workers"
}
So basically as you can see from the above response that there are 2 workers named "worker1080" active on that address.
The bellow php code is how i retrieve the data and output only the names of the workers:
<?php
$btcwallet = get_btc_addy();
if (isset($cur_addy)) {
$method4 = new methods();
$worker_stats = new urls();
$get_data = file_get_contents(utf8_encode($worker_stats->nice_url.$method4->m4.$cur_addy));
$get_json = json_decode($get_data, true);
foreach ($get_json['result']['workers'] as $v) {
$i = 0;
print $v[$i++]."<br />";
}
}
?>
$get_json is the variable that decodes the data from $get_data and displays the worker names and increments every time a worker is added or online.
now i currently have 2 workers online as shown in the JSON response.
it outputs:
worker1080
worker1080
which is perfect although if i try using a foreach statement and try to display the the total amount of workers online it should display 2 instead of the names, it has to also increment for each worker that the json repsonse outputs.
EG: i have 2 workers online now, but in an hour i will connect 10 more it would display the following:
worker1080
worker1080
worker1070
worker1070
worker1080ti
worker1080ti
workerASIC
workerASIC
workerASIC
workerCPU
workerCPU
workerCPU
Now i try to use the following to display the total:
count($v[$i++]);
and i have tried using a foreach within the foreach, and both count and the foreach both will either display "0" by all the workers or "1"
bellow is an example of the output.
0
0
0
0
0
How would i go about counting each line in the output and display the total number of workers ?
Thanks #symcbean for the solution.
<?php
$btcwallet = get_btc_addy();
if (isset($cur_addy)) {
$method4 = new methods();
$worker_stats = new urls();
$get_data = file_get_contents(utf8_encode($worker_stats->nice_url.$method4->m4.$cur_addy));
$get_json = json_decode($get_data, true);
print count($get_json['result']['workers'])."<br />"; // <-- solution *removed foreach and $i incrementation as its not needed for count
}
?>
it now displays the correct number of workers :)

Iterate through array when item is found end loop but not found call function

Hi all I have a question. I have an array that is dynamically populated. In the array there are 2 main types of items. Item with file name that equals 27 characters and the rest are either more or less. I am able to separate the 2 two types. The second list is added to a new array called $usedArray. Those items are then iterated through and the file name is substringed from character 0,6 to be used to compare the enduser's input on the page.
if the item is found in that array it will fire a function to send them a text and a email of the the full file name. my problem is if the item is not found until x iterations, it would fire the not found function x amount of times, and if it's not found at all it does the same thing. If I have 99 items that do not match it fire 99 times. to stop from firing I left the not found to just printing not found on the screen. I thought of calling the notfound function outside the loop but do not want it to fire if an items is found
This is my code I have so far
do{
if (substr($val,0,6) == $studentID)
{
$codeFound = substr($val,22,19);
print_r($studentID . ' is found <br /> Their code is ' . $codeFound);
//sendText($phoneNum,$codeFound,$messageMonth);
//sendEmail($emailInfo,$messageMonth,$codeFound);
break 1;
}
else
{
print_r($studentID . " was not found <br />");
}
} while(list(, $val) = each($usedArray));
This is my output
166003 was not found
166003 was not found
166003 was not found
166003 is found
Their code is xxxxxxxxxx
I think you should add a flag to track if you have found something or not:
$item_found = false;
do{
if (substr($val,0,6) == $studentID)
{
$codeFound = substr($val,22,19);
print_r($studentID . ' is found <br /> Their code is ' . $codeFound);
//sendText($phoneNum,$codeFound,$messageMonth);
//sendEmail($emailInfo,$messageMonth,$codeFound);
// item found!
$item_found = true;
break 1;
}
} while(list(, $val) = each($usedArray));
// now check - if `$item_found` is false
// then you can send your NotFoundEmail
if (!$item_found) {
sendNotFoundEmail();
}

Generate the set of data into Json instead of one big mixed Json

How can i create one Json with many small sets of data separated by comma?
Instead of one big Json enclosed by double curly brackets?
I do receive a Json and with php i do use foreach to loop over it, making a lot of data processing inside.
Then generate a new Json, just to avoid the data processing on the client side wich will be processed by angularjs ng-repeat.
All the json data is mixed into one big json set (inside double curly brackets)
My goal is to separate into small sets of data.
I can use the NrType property. In this script the NrType receives the last atribution and just the last received is available.
//The php script
$arr = json_decode($returnedJson); //The original json to be pre-processed
$processedData = "[]";
$processedJson = json_decode($processedData,true);
foreach($arr as $key=>$value) {
foreach($value as $vkey=>$vvalue) {
if( $value[$i]->NrType == 1 ) {
$VlMIni = $value[$i]->QttyInitial;
$VlMSub = $value[$i]->QttyPeriod + $value[$i]->QttyRealAfter;
$VlMRec = $value[$i]->RealValue;
$VlMTotal += $VlMesRece;
//much more data processing going on here ...
} elseif( $value[$i]->NrType == 2 ) {
.
.
.
//and much more data processing going on here ...
}
}
//simple data atribution here
$processedJson['labelDesIni'] = 'Instruments';
$processedJson['labelValueMIni'] = $lblVlMIni;
$processedJson['labelValuePIni'] = $lblVlPlIni;
$processedJson['labelValueAIni'] = $lblVlAIni;
$processedJson['labelValuePAIni'] = $lblVlPAIni;
$processedJson['labelValuePercInic'] = $lblVlPercInic;
$processedJson['labelValuePerc2Inic'] = $lblVlPerc2Inic;
//much more data atribution ...
echo json_encode($processedJson); //the new hgenerated Json
The generated Json :
{
labelDesI: "Inspection",
labelValueMI: "2357",
labelValuePlI: "3914066",
labelValueAI: "1389406",
labelValuePAI: "2431425",
labelValuePercI: 57.143691456656,
labelValuePerc2I: 35.497766261478,
labelDesR: "Instruments",
labelValueMR: "734.54",
labelValuePR: "819.14",
labelValueAR: "660.05",
labelValuePAR: "877.94",
labelValuePercR: 80.087,
labelValuePerc2R: 44.739,
labelDesAcfi: "Fiscalização",
labelValueMAcfi: "343",
labelValuePlAcfi: "29907",
labelValueAAcfi: "16718",
labelValuePAAcfi: "16493",
labelValuePercAcfi: 101.36421512157,
labelValuePerc2Acfi: 55.899956531916,
labelDesT: "Totals",
labelValueMT: 365.59,
labelValuePlT: 547.62,
labelValueAnT: 909.63,
labelValuePAnT: 957.63,
labelValuePercT: 22949,
labelValuePerc2T: 25065
}
The desired format Would be this :
{
label: "Inspection",
labelValue1: "2357",
labelValue2: "3914066",
labelValue3: "1389406",
labelValue4: "2431425",
labelValue5: 57.1456656,
labelValue6: 35.4961478
},
{
labelDesR: "Instruments",
labelValue1: "734.54",
labelValue2: "819.14",
labelValue3: "660.05",
labelValue4: "877.94",
labelValue5: 80.087,
labelValue6: 44.739
},
{
labelDesT: "Totals",
labelValue1: 365.59,
labelValue2: 547.62,
labelValue3: 909.63,
labelValue4: 957.63,
labelValue5: 22949,
labelValue6: 25065
}
Thank´s in advance
generate all of your object separately, create an array of these objects and json_encode the array:
$processedJsonElement[] = ['labelDesT' => "Totals", 'labelValue1' => $whatTheValueIs, . . .];
and add it to you main object:
$processedJson[] = $processedJsonElement;
do this for each section of Json you want to represent. Not sure how you're structuring your Foreach loop as your code doesn't match the output, but whatever you structure is whatever you will will output when you call json_encode.
Basically, you need to structure your foreach loop to be able to compartmentalize the objects you wish to represent as an array of json objects.

How do I use the output from a php file in a TemplaVoila FCE?

I am trying to use the output from a php file in a TemplaVoila FCE.
According to the articles, etc, I have found on the subject, I seem to be doing it right. But it does not work.
I have reduced my implementation to a very simple test, and I hope that someone here can tell me what I am doing wrong.
The php code is in fileadmin/php/test.php
The file contains this code:
<?php
function getBeechgroveTest($content, $conf)
{
return 'B';
}
//echo getBeechgroveTest(0,0);
?>
In the main template (template module - not TemplaVoila) I have added this line:
includeLibs.beechgroveTest = fileadmin/php/test.php
I have tried to put it at the root level and inside a PAGE object. Both gave the same result.
If I uncomment the 'echo' line I get a 'B' at the top of my HTML page, so the php must be read at some point.
My FCE has one field of type 'None (TypoScript only)' and contains this code:
10 = TEXT
10 {
value = A
}
20 = USER
20 {
userFunc = getBeechgroveTest
}
30 = TEXT
30 {
value = C
}
I was expecting the FCE to output 'ABC', but I only get 'AC'.
What am I doing wrong?
I use TYPO3 version 4.5.30 and TemplVoila 1.8.0
It must by problem in cache, try use USER_INT instead USER. If you create this object as USER_INT, it will be rendered non-cached, outside the main page-rendering.
20 = USER_INT
20 {
userFunc = getBeechgroveTest
}

PHP script supposed to take 6 hours but stops after 30 minutes

I've made a basic web crawler to scrape info from a website and I estimated that it should take around 6 hours (multiplying the number of pages by how long it takes to grab the info) but after around 30-40 minutes of looping through my function, it stops working and I only have a fraction of the info I wanted. When it is working, the page looks like it's loading and it outputs where it's up to on the screen, but when it stops, the page stops loading and the input stops showing.
Is there anyway that I can keep the page loading so I don't have to start it again every 30 minutes?
EDIT: Here's my code
function scrape_ingredients($recipe_url, $recipe_title, $recipe_number, $this_count) {
$page = file_get_contents($recipe_url);
$edited = str_replace("<h2 class=\"ingredients\">", "<h2 class=\"ingredients\"><h2>", $page);
$split = explode("<h2 class=\"ingredients\">", $edited);
preg_match("/<div[^>]*class=\"module-content\">(.*?)<\\/div>/si", $split[1], $ingredients);
$ingred = str_replace("<ul>", "", $ingredients[1]);
$ingred = str_replace("</ul>", "", $ingred);
$ingred = str_replace("<li>", "", $ingred);
$ingred = str_replace("</li>", ", ", $ingred);
echo $ingred;
mysql_query("INSERT INTO food_tags (title, link, ingredients) VALUES ('$recipe_title', '$recipe_url', '$ingred')");
echo "<br><br>Recipes indexed: $recipe_number<hr><br><br>";
}
$get_urls = mysql_query("SELECT * FROM food_recipes WHERE id>3091");
while($row = mysql_fetch_array($get_urls)) {
$count++;
$thiscount++;
scrape_ingredients($row['link'], $row['title'], $count, $thiscount);
sleep(1);
}
What's your php.ini's set_time_limit option value? it must be set to 0 in order for script to be able to work infinitely
Try adding
set_time_limit(0);
at the top of your script.

Categories