PHPExcel takes unreasonably long to create spreadsheet - php

I am using PHPExcel to generate some pretty hefty spreadsheets on the fly for our users. This seems to work fine until we start to get up to medium sized spreadsheets. I have user that is trying to export a spreadsheet that is about 6000 rows with 11 columns and this is bringing my script to its knees. Unfortunately because the spreadsheets are very dynamic there is no way to generate them ahead of time so I am stuck doing this on the fly for each user request.
I have run some tests and it seems that adding the rows to the spreadsheets gets continually slower as the script proceeds. For example the following is being reported by my error logging:
1st set of 1000 rows completes 13.34 Seconds into the script
2nd set of 1000 rows completes 54.57 Seconds into the script
3rd set of 1000 rows completes 135.33 Seconds into the script
4th set of 1000 rows completes 250.60 Seconds into the script
5th set of 1000 rows completes 394.53 Seconds into the script
I have adjusted the script to use the following code to add each row to the spreadsheet:
$sheet->fromArray($row_array, NULL, 'B' . $row_counter);
Instead of adding each cell individually, but have not seen any increase in speed.
The total code to create each row and format it is:
if($row_counter % 2 == 0){
$active_color = $even;}
Else{
$active_color = $odd;}
$sheet->getStyle('B' . $row_counter . ':' . chr($colspan_endletter) . $row_counter)->applyFromArray(
array('fill' => array(
'type' => PHPExcel_Style_Fill::FILL_SOLID,
'color' => array('argb' => $active_color)
),
'borders' => array(
'left' => array('style' => PHPExcel_Style_Border::BORDER_MEDIUM),
'right' => array('style' => PHPExcel_Style_Border::BORDER_MEDIUM)
)
)
);
$sheet
->getStyle('B' . $row_counter . ':' . chr($colspan_endletter) . $row_counter)
->getAlignment()
->setWrapText(true)
->setHorizontal(PHPExcel_Style_Alignment::HORIZONTAL_CENTER)
->setVertical(PHPExcel_Style_Alignment::VERTICAL_CENTER);
Any idea why this is killing my script, or a way to make it complete in a reasonable timeframe?

Well to start, your two calls to set the style could be combined into a single call:
$sheet->getStyle('B' . $row_counter . ':' . chr($colspan_endletter) . $row_counter)->applyFromArray(
array(
'fill' => array(
'type' => PHPExcel_Style_Fill::FILL_SOLID,
'color' => array('argb' => $active_color)
),
'borders' => array(
'left' => array('style' => PHPExcel_Style_Border::BORDER_MEDIUM),
'right' => array('style' => PHPExcel_Style_Border::BORDER_MEDIUM)
),
'alignment' => array(
'wrap' => true,
'horizontal' => PHPExcel_Style_Alignment::HORIZONTAL_CENTER,
'vertical' => PHPExcel_Style_Alignment::VERTICAL_CENTER
)
)
);
You could also set this as a default workbook style, and only set style for cells/ranges where it differed

Unfortunately, as your spreadsheets grow, PHPExcel will take longer and longer to generate them. If you think that 6.000 rows per spreadsheet is the maximum you will need to support, you can probably optimize your current code to make it faster.
However if you think you may have to generate bigger spreadsheets, you'll reach PHPExcel's limits and I would recommend you to look at other libraries that are built specifically for this use case, like Spout (https://github.com/box/spout). Your code will then be future-proof.

Related

Elgg and the utilization of Relationship_created_time_lower

Finishing of my elgg plugin has come to some issues, after fixing my last question I have encountered another. Apperently I am misusing or misunderstanding the use of the Created time lower and upper functions in Elgg
With the code below:
$monthSpan = (30 * 24 * 60 * 60);
$startTime = time() - $monthSpan;
$MemberDifference = elgg_get_entities_from_relationship(array(
'relationship' => 'member', //get Members
'relationship_guid' => $group->guid, //get individual guid for use
'inverse_relationship' => true,
'type' => 'user', //users are returned
'limit' => 20,
'joins' => array("JOIN {$db_prefix}users_entity u ON e.guid=u.guid"),
'order_by' => 'u.name ASC',
'relationship_created_time_lower' => $startTime, //the furthest back it will reach
'relationship_created_time_upper' => time(), //possibly unneeded, but ensures the closest date is today
'count' => true,
));
Using this function, I built it upon my way to get all of the members in the associated group, theoretically it should now grab any members that registered to that group within the last month. Unfortunately, it instead continues to grab every member in the group, regardless of the time they joined.
Does anyone have any information into where I have went wrong?
Turns out, my version of Elgg was too low, otherwise that entire block of code would work. Working Elgg 1.8, I needed to use the following code:
$MemberDifference = elgg_get_entities_from_relationship_count(array(
'relationship' => 'member',
'relationship_guid' => $Reports->guid,
'inverse_relationship' => true,
'type' => 'user',
'limit' => 20,
'count' => true,
'joins' => array("JOIN {$db_prefix}users_entity u ON e.guid=u.guid"),
'order_by' => 'u.name ASC',
'wheres' => array('r.time_created >=' . $startTime)
));
This works perfectly and brings about exactly what im looking for.

Elasticsearch Completion

I have a elasticsearch index which i update every 10 minutes via cronjob. In this index i have a completion field which works as expected.
But i have one little problem. Lets say i have a "article" field where i change a value from "a" to "b". After 10 minutes the index is been updated and the document which holds article "a" is been updated to article "b". Everything as expected.
But my completion field now holds both values. "a" and "b" both with the same id.
How can this happen?
Mapping:
'suggest' => array(
'type' => 'completion',
'payloads' => true,
'preserve_separators' => false,
'search_analyzer' => 'standard',
'index_analyzer' => 'standard'
),
How i set the field:
'suggest' => array(
'input' => array(
$result["Name"],
$result["Name"],
$result["Name2"],
$result["Name3"],
$result["Name4"],
$result["Name5"]
),
'output' => $result["Name"].' (' . $result["Name1"].', '.$result["Name2"].')',
'payload' => array(
'id' => $result["ID"]
)
)
Found the answer in the docs.
The suggest data structure might not reflect deletes on documents immediately. You may need to do an Optimize for that. You can call optimize with the only_expunge_deletes=true to only cater for deletes or alternatively call a Merge operation.

Updating the data in the php code automatically from the mysql database

This is a part of the php code.
$params = array(
'screen_name' => 'phantom',
'count' => 5,
'exclude_replies' => true
);
I have a database(mysql) that has a table storing various screen names which keeps on updating with time automatically. Since I'm running a cron job on this php file containing this code mentioned above I wanted even this code automatically updating and containing all the names from the table automatically. Something like
$params = array(
'screen_name' => 'phantom','tuxseo','deredo',
'count' => 5,
'exclude_replies' => true
);
Is it possible?
Select the records from the database and store them in $params["screen_name"].

understanding ElasticSearch routing

I am trying to use the elasticsearch routing mapping to speed up some queries, but I am not getting the expected result set (not worried about the query performance just yet)
I am using Elastic to set up my mapping:
$index->create(array('number_of_shards' => 4,
'number_of_replicas' => 1,
'mappings'=>array("country"=>array("_routing"=>array("path"=>"countrycode"))),
'analysis' => array(
'analyzer' => array(
'indexAnalyzer' => array(
'type' => 'keyword',
'tokenizer' => 'nGram',
'filter' => array('shingle')
),
'searchAnalyzer' => array(
'type' => 'keyword',
'tokenizer' => 'nGram',
'filter' => array('shingle')
)
)
) ), true);
If I understand correctly, what should happen is that each result should now have a field called "countrycode" with the value of "country" in it.
The results of _mapping look like this:
{"postcode":
{"postcode":
{"properties":
{
"area1":{"type":"string"},
"area2":{"type":"string"},
"city":{"type":"string",
"include_in_all":true},
"country":{"type":"string"},
"country_iso":{"type":"string"},
"country_name":{"type":"string"},
"id":{"type":"string"},
"lat":{"type":"string"},
"lng":{"type":"string"},
"location":{"type":"geo_point"},
"region1":{"type":"string"},
"region2":{"type":"string"},
"region3":{"type":"string"},
"region4":{"type":"string"},
"state_abr":{"type":"string"},
"zip":{"type":"string","include_in_all":true}}},
"country":{
"_routing":{"path":"countrycode"},
"properties":{}
}
}
}
Once all the data is in the index if I run this command:
http://localhost:9200/postcode/_search?pretty=true&q=country:au
it responds with 15740 total items
what I was expecting is that if I run the query like this:
http://localhost:9200/postcode/_search?routing=au&pretty=true
Then I was expecting it to respond with 15740 results
instead it returns 120617 results, which includes results where country is != au
I did note that the number of shards in the results went from 4 to 1, so something is working.
I was expecting that in the result set there would be an item called "countrycode" (from the rounting mapping) which there isn't
So I thought at this point that my understand of routing was wrong. Perhaps all the routing does is tell it which shard to look in but not what to look for? in other words if other country codes happen to also land in that particular shard, the way those queries are written will just bring back all records in that shard?
So I tried the query again, this time adding some info to it.
http://localhost:9200/postcode/_search?routing=AU&pretty=true&q=country:AU
I thought by doing this it would force the query into giving me just the AU place names, but this time it gave me only 3936 results
So I Am not quite sure what I have done wrong, the examples I have read show the queries changing from needing a filter, to just using match_all{} which I would have thought would only being back ones matching the au country code.
Thanks for your help in getting this to work correctly.
Almost have this working, it now gives me the correct number of results in a single shard, however the create index is not working quite right, it ignores my number_of_shards setting, and possibly other ones too
$index = $client->getIndex($indexname);
$index->create(array('mappings'=>array("$indexname"=>array("_routing"=>array("required"=>true))),'number_of_shards' => 6,
'number_of_replicas' => 1,
'analysis' => array(
'analyzer' => array(
'indexAnalyzer' => array(
'type' => 'keyword',
'tokenizer' => 'nGram',
'filter' => array('shingle')
),
'searchAnalyzer' => array(
'type' => 'keyword',
'tokenizer' => 'nGram',
'filter' => array('shingle')
)
)
) ), true);
I can at least help you with more info on where to look:
http://localhost:9200/postcode/_search?routing=au&pretty=true
That query does indeed translate into "give me all documents on the shard where documents for country:AU should be sent."
Routing is just that, routing ... it doesn't filter your results for you.
Also i noticed you're mixing your "au"s and your "AU"s .. that might mix things up too.
You should try setting required on your routing element to true, to make sure that your documents are actually stored with routing information when being indexed.
Actually to make sure your documents are indexed with proper routing explicitly set the route to lowercase(countrycode) when indexing documents. See if that helps any.
For more information try reading this blog post:
http://www.elasticsearch.org/blog/customizing-your-document-routing/
Hope this helps :)

batch put_item with DynamoDB in PHP

I'm running into issues sending data to my DynamoDB. I have no idea what the issue is because it appears the program runs correctly, however I don't seem to have any data in my DB. I was able to create tables using Amazons tutorial, but when I follow this tutorial, I get a failed response if I try and put ALL the items, and a false success when it's only one item, as nothing is updated in the db.
Here's the code, I'm curious specifically if anyone knows a means to debug these kinds of issues.
<?php
// If necessary, reference the sdk.class.php file.
// For example, the following line assumes the sdk.class.php file is
// in an sdk sub-directory relative to this file
require_once('includes/backend.php');
// Instantiate the class
$dynamodb = new AmazonDynamoDB();
####################################################################
# Setup some local variables for dates
$one_day_ago = date('Y-m-d H:i:s', strtotime("-1 days"));
$seven_days_ago = date('Y-m-d H:i:s', strtotime("-7 days"));
$fourteen_days_ago = date('Y-m-d H:i:s', strtotime("-14 days"));
$twenty_one_days_ago = date('Y-m-d H:i:s', strtotime("-21 days"));
####################################################################
# Adding data to the table
echo PHP_EOL . PHP_EOL;
echo "# Adding data to the table..." . PHP_EOL;
// Set up batch requests
$queue = new CFBatchRequest();
$queue->use_credentials($dynamodb->credentials);
// Add items to the batch
$dynamodb->batch($queue)->put_item(array(
'TableName' => 'ProductCatalog',
'Item' => array(
'Id' => array( AmazonDynamoDB::TYPE_NUMBER => '101' ), // Hash Key
'Title' => array( AmazonDynamoDB::TYPE_STRING => 'Book 101 Title' ),
'ISBN' => array( AmazonDynamoDB::TYPE_STRING => '111-1111111111' ),
'Authors' => array( AmazonDynamoDB::TYPE_ARRAY_OF_STRINGS => array('Author1') ),
'Price' => array( AmazonDynamoDB::TYPE_NUMBER => '2' ),
'Dimensions' => array( AmazonDynamoDB::TYPE_STRING => '8.5 x 11.0 x 0.5' ),
'PageCount' => array( AmazonDynamoDB::TYPE_NUMBER => '500' ),
'InPublication' => array( AmazonDynamoDB::TYPE_NUMBER => '1' ),
'ProductCategory' => array( AmazonDynamoDB::TYPE_STRING => 'Book' )
)
));
echo "Item put in <b>Reply</b>" . "<br/>";
// Execute the batch of requests in parallel
$responses = $dynamodb->batch($queue)->send();
// Check for success...
if ($responses->areOK())
{
echo "The data has been added to the table." . PHP_EOL;
}
else
{
utdump($responses);
}
Thank you for your time
Try setting your region explicitly, e.g. for EU you need to call:
$myDynamoDbObject->set_region('dynamodb.eu-west-1.amazonaws.com');
It might also be that you have two tables in different regions, your application work with one of them and you trying to read (or track with web console) another one because of different default regions applied.
If you're tracking number of items in the table, keep in mind that it is not a real time figure, it is only updated every 6 hours or so. So the way use can ensure your item is indeed written by trying to read it back immediately after you receive the "OK" response for your put request.

Categories