More effecienent way to process data feed of 100K entries? - php

I have a csv file that has roughly 100K entries I need to process and insert into a data base.
Previously it was very slow because it makes an SQL call for every entry. I do this though because if I try to build 1 single query to do this I will run out of memory.
I migrated to a new server and now I get an error every time I run it:
SQL Error : 2006 MySQL server has gone away
I am not sure but think this is just happening because how inefficient my code is.
What can I do to make it perform better and not get the error?
Here is the code:
//empty table before saving new feed
$model->query('TRUNCATE TABLE diamonds');
$fp = fopen($this->file,'r');
while (!feof($fp))
{
$diamond = fgetcsv($fp);
//skip the first line
if(!empty($firstline))
{
$firstline = true;
continue;
}
if(empty($diamond[17]))
{
//no price -- skip it
continue;
}
$data = array(
'seller' => $diamond[0],
'rapnet_seller_code' => $diamond[1],
'shape' => $diamond[2],
'carat' => $diamond[3],
'color' => $diamond[4],
'fancy_color' => $diamond[5],
'fancy_intensity' => $diamond[6],
'clarity' => empty($diamond[8]) ? 'I1' : $diamond[8],
'cut' => empty($diamond[9]) ? 'Fair' : $diamond[9],
'stock_num' => $diamond[16],
'rapnet_price' => $diamond[17],
'rapnet_discount' => empty($diamond[18]) ? 0 : $diamond[18],
'cert' => $diamond[14],
'city' => $diamond[26],
'state' => $diamond[27],
'cert_image' => $diamond[30],
'rapnet_lot' => $diamond[31]
);
$measurements = $diamond[13];
$measurements = strtolower($measurements);
$measurements = str_replace('x','-',$measurements);
$mm = explode('-',$measurements);
$data['mm_width'] = empty($mm[0]) ? 0 : $mm[0];
$data['mm_length'] = empty($mm[1]) ? 0 : $mm[1];
$data['mm_depth'] = empty($mm[2]) ? 0 : $mm[2];
//create a new entry and save the data to it.
$model->create();
$model->save($data);
}
fclose($fp);

You're probably exceeding MySQL's max_allowed_packet setting, which sets a hard limit (in bytes) on how long a query string can be. There's nothing wrong with doing multi-value inserts, but 100k of them is definitely pushing things.
Instead of doing all 100k at once, try doing 1000 in a loop. You're still reducing total query count (down from 100k to just 1000), so it's still a net gain.

Related

how can optimization function this code in PHP smarty?

how can optimization function this code in PHP smarty?
now I've a code to confuse me, there is a simple code.
$sql_set_land = "select * from set_new_land where id_rent_house =".Tools::getValue("id_rent_house");
//print_r($sql_set_land);
$n_land = Db::rowSQL($sql_set_land,true);
$landTitle1 = $n_land['landTitle1'];
$landTitle2 = $n_land['landTitle2'];
$landBuilderNumber = $n_land['landBuilderNumber'];
$landLandMark = $n_land['landLandMark'];
$land_1 = $n_land['land_1'];
$land_2 = $n_land['land_2'];
$land_3 = $n_land['land_3'];
...
...
...
$land_30 = $n_land['land_30'];
$land_31 = $n_land['land_31'];
$land_32 = $n_land['land_32'];
when I search the code that I need to push the value.
$this->context->smarty->assign([
'park_space' =>$park_space,
'recording_data' =>$recording_data,
'clode_number' => $clode_number,
'ad_choose_top' => $ad_choose_top,
'ad_choose_type' => $ad_choose_type,
'ad_choose_payment_type' => $ad_choose_payment_type,
'ad_choose_payment_type1' => $ad_choose_payment_type1,
'landTitle1' => $landTitle1,
'landTitle2' => $landTitle2,
'landBuilderNumber' => $landBuilderNumber,
'landLandMark' => $landLandMark,
'land_1' => $land_1,
'land_2' => $land_2,
'land_3' => $land_3,
...
...
...
'land_30' => $land_30,
'land_31' => $land_31,
'land_32' => $land_32,
how can optimization function? Can I write to array? if It's can write into array, how can I do?
Your problem is not in the PHP code, it is in the design of your database. Any time you need to have numbers in column names is a sign that you've failed to normalize your data.
If you expanded the *, which is generally good practice to avoid surprises when you make changes to your database, you would have to write this:
select
landTitle1,
landTitle2,
landBuilderNumber,
landLandMark,
land_1,
land_2,
land_3,
land_4,
land_5,
land_6,
land_7,
land_8,
land_9,
land_10,
land_11,
land_12,
land_13,
land_14,
land_15,
land_16,
land_17,
land_18,
land_19,
land_20,
land_21,
land_22,
land_23,
land_24,
land_25,
land_26,
land_27,
land_28,
land_29,
land_30,
land_31,
land_32
from set_new_land
where id_rent_house = :id
With a properly normalised database, you would instead write something like this:
select
SND.landTitle1,
SND.landTitle2,
SND.landBuilderNumber,
SND.landLandMark,
L.landNumber,
L.land
from set_new_land as SND
join lands as L
On L.set_new_land_id = SND.set_new_land_id
where SND.id_rent_house = :id
Then in PHP, you can use the array_column function:
$lands = array_column($dbResults, 'land');
// or, if the land numbers are important
$lands = array_column($dbResults, 'land', 'landNumber');
If you can't fix your data, though, you can transform it into a more sensible form in PHP, with a loop that counts from 1 to 32:
$lands = [];
for ( $landNumber=1; $landNumber<=32; $landNumber++ ) {
$columnName = 'land_' . $landNumber;
$lands[] = $messyDbResults[ $columnName ];
}

code needs to loop over minimum 2000 times in php foreach

I am having the foreach loop that will run minimum 2000 loops
foreach ($giftCardSchemeData as $keypreload => $preload) {
for ($i=0; $i <$preload['quantity'] ; $i++) {
$cardid = new CarddetailsId($uuidGenerator->generate());
$cardnumber = self::getCardNumber();
$cardexistencetype = ($key == "giftCardSchemeData") ? "Physical" : "E-Card" ;
$giftCardSchemeDataDb = array('preload' => array('value' => $preload['value'], 'expirymonths' => $preload['expiryMonths']));
$otherdata = array('cardnumber' => $cardnumber, 'cardexistencetype' => $cardexistencetype, 'isgiftcard' => true , 'giftcardamount' => $preload['value'],'giftCardSchemeData' => json_encode($giftCardSchemeDataDb), 'expirymonths' => $preload['expiryMonths'], 'isloyaltycard' => false, 'loyaltypoints' => null,'loyaltyCardSchemeData' => null, 'loyaltyRedeemAmount' => null, 'pinnumber' => mt_rand(100000,999999));
$output = array_merge($data, $otherdata);
// var_dump($output);
$carddetailsRepository = $this->get('oloy.carddetails.repository');
$carddetails = $carddetailsRepository->findByCardnumber($cardnumber);
if (!$carddetails) {
$commandBus->dispatch(
new CreateCarddetails($cardid, $output)
);
} else {
self::generateCardFunctionForErrorException($cardid, $output, $commandBus);
}
}
}
Like above foreach I am having totally 5 of them. When I call the function each time the 5 foreach runs and then return the response. It take more time and the php maximum execution time occurs.
Is there a any way to send the response and then we could run the foreach in server side and not creating the maximum execution time issue.Also need an optimization for the foreach.
Also In symfony I have tried the try catch method for the existence check in the above code it return the Entity closed Error. I have teprorily used the existence check in Db but need an optimization
There seems to be a lot wrong (or to be optimized) with this code, but let's focus on your questions:
First I think this code shouldn't be in code that will be triggered by a visitor.
You should seperate 2 processes:
1. A cronjob that runs that will generate everything that must be generated and saved that generated info to a database. The cronjob can take as much time as it needs. Look at Symfony's console components
2. A page that displays only the generated info by fetching it from the database and passing it to a Twig template.
However, looking at the code you posted I think it can be greatly optimized as is. You seem to have a foreach loop that fetches variable data, and in that you have a for-loop that does not seem to generate much variability at all.
So most of the code inside the for loop is now being executed over and over again without making any actual changes.
Here is a concept that would give much higher performance. Ofcourse since I don't know the actual context of your code you will have to "fix it".
$carddetailsRepository = $this->get('oloy.carddetails.repository');
$cardexistencetype = ($key == "giftCardSchemeData") ? "Physical" : "E-Card";
foreach ($giftCardSchemeData as $keypreload => $preload) {
$cardnumber = self::getCardNumber();
$carddetails = $carddetailsRepository->findByCardnumber($cardnumber);
$giftCardSchemeDataDb = array('preload' => array('value' =>
$preload['value'], 'expirymonths' => $preload['expiryMonths']));
$otherdata = array('cardnumber' => $cardnumber, 'cardexistencetype' =>
$cardexistencetype, 'isgiftcard' => true , 'giftcardamount' =>
$preload['value'],'giftCardSchemeData' =>
json_encode($giftCardSchemeDataDb), 'expirymonths' =>
$preload['expiryMonths'], 'isloyaltycard' => false, 'loyaltypoints' =>
null,'loyaltyCardSchemeData' => null, 'loyaltyRedeemAmount' => null,
'pinnumber' => 0);
$output = array_merge($data, $otherdata);
for ($i=0; $i <$preload['quantity'] ; $i++) {
$cardid = new CarddetailsId($uuidGenerator->generate());
$output['pinnumber'] = mt_rand(100000,999999);
if (!$carddetails) {
$commandBus->dispatch(
new CreateCarddetails($cardid, $output)
);
} else {
self::generateCardFunctionForErrorException($cardid, $output, $commandBus);
}
}
}
Also: if in this code you are triggering any database inserts or updates, you don't want to trigger them each iteration. You will want to start some kind of database transaction and flush the queries each X iterations instead.

PHP Processing stream_get_contents into an array

I am having troubles converting a string which contains tab delimited CSV output
into an array. While using the following code :
$data = stream_get_contents($request->getShipmentReport());
I get back data as following:
Date Shipped Comments Feedback Arrived on Time
5/11/15 2 comment response Yes
Now I would like to process this data into an array with the returned headers(line 1) as the index containing the value for each line which follows after that.
I am trying to end up with something like this :
$line['Date'] = 5/11/15
$line['Shipped'] = 2
$line['Feedback'] = response
$line['Arrived on Time'] yes
What would be the best way to achieve this ?
I recommend using a library for handling CSV for example the CSV-Package by The League of Extraordinary Packages
You can pass your data and start working with array structures right away:
$csvString = stream_get_contents($request->getShipmentReport());
$csvReader = \League\Csv\Reader::createFromString($csvString);
$data = $csvReader->fetchAssoc(
array('Date', 'Shipped', 'Comments', 'Feedback', 'Arrived on Time')
);
// Or since line 0 contains header
$data = $csvReader->fetchAssoc(0); // 0 is offset of dataset containing keys
Now your data should be in a structure like:
$data = array(
array(
'Date' => '5/11/15',
'Shipped' => '2',
...
),
array(
...
)
);
You can even filter out specific datasets and all kinds of fancy stuff. Just check the documentation I linked to.
League CSV reader gives me a bad result. I use this instead:
$csvReportString = stream_get_contents($request->getReport());
$csvReportRows = explode("\n", $csvReportString);
$report = [];
foreach ($csvReportRows as $c) {
var_dump($c);
if ($c) { $report[] = str_getcsv($c, "\t");}
}
var_dump($report);

simple_html_dom.php

I am using "simple_html_dom.php" to scrap the data from the Wikipedia site. If I run the code in scraperwiki.com it's throwing an error as exit status 139 and if run the same code in my xampp sever, the server is hanging.
I have a set of links
I'm trying to get Literacy value from all the sites
If I run the code with one link there is no problem and it's returning the expected result
If I try to get data from all the sites in one go I'm facing the above problem
The code is:
<?php
$test=array
(
0 => "http://en.wikipedia.org/wiki/Andhra_Pradesh",
1 => "http://en.wikipedia.org/wiki/Arunachal_Pradesh",
2 => "http://en.wikipedia.org/wiki/Assam",
3 => "http://en.wikipedia.org/wiki/Bihar",
4 => "http://en.wikipedia.org/wiki/Chhattisgarh",
5 => "http://en.wikipedia.org/wiki/Goa",
for($ix=0;$ix<=9;$ix++){
$content = file_get_html($test[$ix]);
$tables = $content ->find('#mw-content-text table',0);
foreach ($tables ->children() as $child1) {
foreach($child1->find('th a') as $ele){
if($ele->innertext=="Literacy"){
foreach($child1->find('td') as $ele1){
echo $ele1->innertext;
}}} }}
Guide me where am wrong. Is there any memory problem??? Is there any xampp configuration???
<?php
require 'simple_html_dom.php';
$test = array(
0 => "http://en.wikipedia.org/wiki/Andhra_Pradesh",
1 => "http://en.wikipedia.org/wiki/Arunachal_Pradesh",
2 => "http://en.wikipedia.org/wiki/Assam",
3 => "http://en.wikipedia.org/wiki/Bihar",
4 => "http://en.wikipedia.org/wiki/Chhattisgarh",
5 => "http://en.wikipedia.org/wiki/Goa");
for($ix=0;$ix<=count($test);$ix++){
$content = file_get_html($test[$ix]);
$tables = $content ->find('#mw-content-text table',0);
foreach ($tables ->children() as $child1) {
foreach($child1->find('th a') as $ele){
if($ele->innertext=="Literacy"){
foreach($child1->find('td') as $ele1){
echo $ele1->innertext;
}
}
}
}
$content->clear();
}
?>
but these URLs are too much. You may get a fatal error of max execution time execeeded or you may get error 324.

Sphinx randomly fails to combine subqueries

I have this sphinx search engine that I use through Zend using sphinxapi.php . It works fantastic. Really really great.
However, there is one problem: It randomly fails.
// Prepare Sphinx search
$sphinxClient = new SphinxClient();
$sphinxClient->SetServer($ip, $port);
$sphinxClient->SetConnectTimeout( 10 );
$sphinxClient->SetMatchMode( SPH_MATCH_ANY );
$sphinxClient->SetLimits( $resultsPerPage * ($nPage - 1), $resultsPerPage );
$sphinxClient->SetArrayResult( true );
$query = array();
$query['lang'] = '#lang "lang' . $language . '"';
if (isset($params)) {
foreach ($params as $param) {
$query['tags'] = '#tags "' . $param . '"';
}
}
// Make the Sphinx search
$sphinxClient->SetMatchMode(SPH_MATCH_EXTENDED);
$sphinxResult = $sphinxClient->Query(implode(' ', $query), '*');
As seen here, I search for a language and an arbitrary amount of tags, imploded into a single query string in the end (instead of making a battleload of subqueries).
So, normally, this works like a charm, but occassionally sphinx returns that it found 2000 entries in English and, say, 1000 entries with the tag "pictures" (or some other purely english word) but ZERO hits that fit both results, which is purely false. In fact, refreshing the page everything returns to normal with something like 800 real results.
My question is why does it do this and how do I make it stop?
Any ideas?
:Edit: Added shortened output log
[error] =>
[warning] =>
...
[total] => 0
[total_found] => 0
[time] => 0.000
[words] => Array (
[langen] => Array (
[docs] => 2700
[hits] => 2701 )
[picture] => Array (
[docs] => 829
[hits] => 1571 ) ) )
have you checked to see if the sphinx client is giving you any error or warning messages that may describe the failure?
if($sphinxResult === false) {
print "Query failed: " . $sphinxClient->GetLastError();
} else {
if($sphinxClient->GetLastWarning()) {
print "WARNING: " . $sphinxClient->GetLastWarning();
}
// process results
}
This issue has been solved completely a few months after the original post. The issue is that our service providers in the umbrella corporation har mistakenly assigned the wrong root values to the sphinx commands. The problem above was actually running on Sphinx 0.9.8 and was obviously buggy. My advice, if you ever experience similar problems is to double-tripple-check the version you use both to index and to query.
It feels like one of those times your math calculation doesn't roll out because you forgot a minus on the first row. Thanks to everyone that have tried to help in this and related threads.

Categories