How to manage PHP memory? - php

I wrote a one-off script that I use to parse PDFs saved on the database. So far it is working okay until I ran out of memory after parsing 2,700+ documents.
The basic flow of the script is as follows:
Get a list of all the document IDs to be parsed and save it as an array in the session (~155k documents).
Display a page that has a button to start parsing
Make an AJAX request when that button is clicked that would parse the first 50 documents in the session array
$files = $_SESSION['files'];
$ids = array();
$slice = array_slice($files, 0, 50);
$files = array_slice($files, 50, null); // remove the 50 we are parsing on this request
if(session_status() == PHP_SESSION_NONE) {
session_start();
}
$_SESSION['files'] = $files;
session_write_close();
for($i = 0; $i < count($slice); $i++) {
$ids[] = ":id_{$i}";
}
$ids = implode(", ", $ids);
$sql = "SELECT d.id, d.filename, d.doc_content
FROM proj_docs d
WHERE d.id IN ({$ids})";
$stmt = oci_parse($objConn, $sql);
for($i = 0; $i < count($slice); $i++) {
oci_bind_by_name($stmt, ":id_{$i}", $slice[$i]);
}
oci_execute($stmt, OCI_DEFAULT);
$cnt = oci_fetch_all($stmt, $data);
oci_free_statement($stmt);
# Do the parsing..
# Output a table row..
The response to the AJAX request typically includes a status whether the script has finished parsing the total ~155k documents - if it's not done, another AJAX request is made to parse the next 50. There's a 5 second delay between each request.
Questions
Why am I running out of memory when I was expecting that peak memory usage would be when I get a list of all the document IDs on #1 since it holds all of the possible documents not a few minutes later when the session array holds 2,700 elements less?
I saw a few questions similar to my problem and they suggested to either set the memory to unlimited which I don't want to do at all. The others suggested to set my variables to null when appropriate and I did that but I still ran out of memory after parsing ~2,700 documents. So what other approaches should I try?
# Freeing some memory space
$batch_size = null;
$with_xfa = null;
$non_xfa = null;
$total = null;
$files = null;
$ids = null;
$slice = null;
$sql = null;
$stmt = null;
$objConn = null;
$i = null;
$data = null;
$cnt = null;
$display_class = null;
$display = null;
$even = null;
$tr_class = null;

So I'm not really sure why but reducing the number of documents I'm parsing from 50 down to 10 for each batch seems to fix the issue. I've gone past 5,000 documents now and the script is still running. My only guess is that when I was parsing 50 documents I must have encountered a lot of large files which used up all of the memory allotted.
Update #1
I got another error about memory running out at 8,500+ documents. I've reduced the batches further down to 5 documents each and will see tomorrow if it goes all the way to parsing everything. If that fails, I'll just increase the memory allocated temporarily.
Update #2
So it turns out that the only reason why I'm running out of memory is that we apparently have multiple PDF files that are over 300MB uploaded on to the database. I increased the memory allotted to PHP to 512MB and this seems to have allowed me to finish parsing everything.

Related

Maximum time execution CodeIgniter 3 issue

I got that the only solution to avoid the Maximum time execution CodeIgniter 3 issue is to increase the time execution from 30 to 300 for example.
I'm using CodeIgniter in a news website. I'm loading only 20 latest news in the news section page and I think that it's not a big number to make the server out of execution time. (Notice that the news table has more than 1400 news and the seen table has more than 150.000 logs).
I say that it's not logical that the user should wait for more than 50 seconds to get the respond and load the page.## Heading ##
Is there any useful way to load the page as fast as possible without "maximum time execution"?
My Code in the model:
public function get_section_news($id_section = 0, $length = 0, $id_sub_section = 0, $id_news_lessthan = 0) {
$arr = [] or array();
//
if (intval($id_section) > 0 and intval($length) > 0) {
//
$where = [] or array();
$where['sections.activity'] = 1;
$where['news.deleted'] = 0;
$where['news.id_section'] = $id_section;
$query = $this->db;
$query
->from("news")
->join("sections", "news.id_section = sections.id_section", "inner")
->order_by("news.id_news", "desc")
->limit($length);
//
if (intval($id_sub_section) > 0) {
$where['news.id_section_sub'] = $id_sub_section;
}
if ($id_news_lessthan > 0) {
$where['news.id_news <'] = $id_news_lessthan;
}
//
$get = $query->where($where)->get();
$num = $get->num_rows();
if ($num > 0) {
//
foreach ($get->result() as $key => $value) {
$arr['row'][] = $value;
}
}
$arr['is_there_more'] = ($length > $num and $num > 0) ? true : false;
}
return $arr;
}
This usually has nothing to do with the framework. You may run the following command on your mysql client and check if there are any sleeping queries on your database.
SHOW FULL PROCESSLIST
most likely you have sleeping queries since you are not emptying result set with
$get->free_result();
Another problem may be slow queries on this I recommend the following
1) make sure you are using the same database engine on all tables for this I recommend INNODB as some engines lock the whole table during a transaction which is undesirable You should have noticed this already when you ran show full processlist
2) Run your queries on a mysql client and observe how long they will take to execute. If they take too long it may be a result of unindexed tables. You may Explain your query to identify unindexed tables. You may follow these 1,2,3 tutorials on indexing your tables. Or you can do it easily with tools like navicat

PHP array inserting / manipulation degrading over iterations

I am in the process of transferring data from one database to another. They are different dbs (mssql to mysql) so I cant do direct queries and am using PHP as an intermediary. Consider the following code. For some reason, each time it goes through the while loop it takes twice as much time as the time before.
$continue = true;
$limit = 20000;
while($continue){
$i = 0;
$imp->endTimer();
$imp->startTimer("Fetching Apps");
$qry = "THIS IS A BASIC SELECT QUERY";
$data = $imp->src->dbQuery($qry, array(), PDO::FETCH_ASSOC);
$inserts = array();
$continue = (count($data) == $limit);
$imp->endTimer();
$imp->startTimer("Processing Apps " . memory_get_usage() );
if($data == false){
$continue = false;
}
else{
foreach($data AS $row){
// THERE IS SOME EXTREMELY BASIC IF STATEMENTS HERE
$inserts[] = array(
"paymentID"=>$paymentID,
"ticketID"=>$ticketID,
"applicationLink"=>$row{'ApplicationID'},
"paymentLink"=>(int)($paymentLink),
"ticketLink"=>(int)($ticketLink),
"dateApplied"=>$row{'AddDate'},
"appliedBy"=>$adderID,
"appliedAmount"=>$amount,
"officeID"=>$imp->officeID,
"customerID"=>-1,
"taxCollected"=>0
);
$i++;
$minID = $row{'ApplicationID'};
}
}
$imp->endTimer();
$imp->startTimer("Inserting $i Apps");
if(count($inserts) > 0){
$imp->dest->dbBulkInsert("appliedPayments", $inserts);
}
unset($data);
unset($inserts);
echo "Inserted $i Apps<BR>";
}
It doesn't matter what I set the limit to, the processing portion takes twice as long each time. I am logging each portion of the loop and selecting the data from the old database and inserting it into the new one take no time at all. The "processing portion" is doubling every time. Why? Here are the logs, if you do some quick math on the timestamps, each step labeled "Processing Apps" takes twice as long as the one before... (I stopped it a little early on this one, but it was taking a significantly longer time on the final iteration)
Well - so I don't know why this works, but if I move everything inside the while loop into a separate function, it DRAMATICALLY increases performance. Im guessing its a garbage collection / memory management issue and that having a function call end helps the Garbage collector know it can release the memory. Now when I log the memory usage, the memory usage stays constant between calls instead of growing... Dirty php...

Overcoming PHP memory exhausted or execution time error when retrieving MySQL table

I have a big table in my MySQL database. I want to go over one of it's column and pass it in a function to see if it exist in another table and if not create it there.
However, I always face either a memory exhausted or execution time error.
//Get my table
$records = DB::($table)->get();
//Check to see if it's fit my condition
foreach($records as $record){
Check_for_criteria($record['columnB']);
}
However, when I do that, I get a memory exhausted error.
So I tried with a for statement
//Get min and max id
$min = \DB::table($table)->min('id');
$max = \DB::table($table)->max('id');
//for loop to avoid memory problem
for($i = $min; $i<=$max; $i++){
$record = \DB::table($table)->where('id',$i)->first();
//To convert in array for the purpose of the check_for_criteria function
$record= get_object_vars($record);
Check_for_criteria($record['columnB']);
}
But going this way, I got a maximum execution time error.
FYI the check_for_criteria function is something like
check_for_criteria($record){
$user = User::where('record', $record)->first();
if(is_null($user)){
$nuser = new User;
$nuser->number = $record;
$nuser->save();
}
}
I know I could ini_set('memory_limit', -1); but I would rather find a way to limit my memory usage in some way or at least spreading it some way.
Should I run these operations in background when traffic is low? Any other suggestion?
I solved my problem by limiting my request to distinct values in ColumnB.
//Get my table
$records = DB::($table)->distinct()->select('ColumnB')->get();
//Check to see if it's fit my condition
foreach($records as $record){
Check_for_criteria($record['columnB']);
}

server error executing a large file

I have created a script which reads an XML file and adds it to the database. I am using XML Reader for this.
The problem is that my XML contains 500,000 products in it. This causes my page to time out. is there a way for me to achieve this?
My code below:
$z = new XMLReader;
$z->open('files/NAGardnersEBook.xml');
$doc = new DOMDocument;
# move to the first node
while ($z->read() && $z->name !== 'EBook');
# now that we're at the right depth, hop to the next <product/> until the end of the tree
while ($z->name === 'EBook')
{
$node = simplexml_import_dom($doc->importNode($z->expand(), true));
# Get the value of each node
$title = mysql_real_escape_string($node->Title);
$Subtitle = mysql_real_escape_string($node->SubTitle);
$ShortDescription = mysql_real_escape_string($node->ShortDescription);
$Publisher = mysql_real_escape_string($node->Publisher);
$Imprint = mysql_real_escape_string($node->Imprint);
# Get attributes
$isbn = $z->getAttribute('EAN');
$contributor = $node->Contributors;
$author = $contributor[0]->Contributor;
$author = mysql_real_escape_string($author);
$BicSubjects = $node->BicSubjects;
$Bic = $BicSubjects[0]->Bic;
$bicCode = $Bic[0]['Code'];
$formats = $node->Formats;
$type = $formats[0]->Format;
$price = $type[0]['Price'];
$ExclusiveRights = $type[0]['ExclusiveRights'];
$NotForSale = $type[0]['NotForSale'];
$arr[] = "UPDATE onix_d2c_data SET is_gardner='Yes', TitleText = '".$title."', Subtitle = '".$Subtitle."', PersonName='".$author."', ImprintName = '".$Imprint."', PublisherName = '".$Publisher."', Text = '".$ShortDescription."', BICMainSubject = '".$bicCode."', ExcludedTerritory='".$NotForSale."', RightsCountry='".$ExclusiveRights."', PriceAmount='".$price."', custom_category= 'Uncategorised', drm_type='adobe_drm' WHERE id='".$isbn."' ";
# go to next <product />
$z->next('EBook');
$isbns[] = $isbn;
}
foreach($isbns as $isbn){
$sql = "SELECT * FROM onix_d2c_data WHERE id='".$isbn."'";
$query = mysql_query($sql);
$count = mysql_num_rows($query);
if($count >0){
} else{
$sql = "INSERT INTO onix_d2c_data (id) VALUES ('".$isbn."')";
$query = mysql_query($sql);
}
}
foreach($arr as $sql){
mysql_query($sql);
}
Thank you,
Julian
You could use the function set_time_limit to extend the allowed script execution time or set max_execution_time in your php.ini.
You need to set these vaiables.Mare sure you have permission to change them
set_time_limit(0);
ini_set('max_execution_time', '6000');
You're executing two queries for each ISBN, just to check whether the ISBN already exists. Instead, set the ISBN column to unique (if it isn't already, it should be) then just go ahead and insert without checking. MySQL will return an error if it detects a duplicate which you can handle. This will reduce the number of queries and improve performance.
You're inserting each title with a separate call to the database. Instead, use the extended INSERT syntax to batch up many inserts in one query - see the MySQL manual for the ful syntax. Batching, say, 250 inserts will save a lot of time.
If you're not happy with batching inserts, use mysqli prepared statements which will reduce parsing time and and transmission time, so should improve your overall performance
You can probably trust Gardners list - consider dropping some of the escaping you're doing. I wouldn't recommend this for user input normally, but this is a special case.
Have you tried adding set_time_limit(0); on top of your PHP file ?
EDIT :
ini_set('memory_limit','16M');
Specify your limit there.
if you don't want to change the max_execution time as proposed by others, then you could also split up your tasks into several smaller tasks and let the server run a cron-job in several intervals.
E.g. 10.000 products each minute
Thank you all for such fast feedback. I managed to get the problem sorted by using array_chunks. Example below:
$thumbListLocal = array_chunk($isbns, 4, preserve_keys);
$thumbListLocalCount = count($thumbListLocal);
while ($i <= $thumbListLocalCount):
foreach($thumbListLocal[$i] as $index => $thumbName):
$sqlConstruct[] = "INSERT IGNORE INTO onix_d2c_data (id) VALUES ('".$thumbName."')";
endforeach;
foreach($sqlConstruct as $processSql){
mysql_query($processSql);
}
unset($thumbListLocal[$i]);
$i++;
endwhile;
I hope this helps someone.
Julian

PHP ldap_search size limit exceeded

I'm quite new to querying Microsoft's Active Directory and encountering some difficulties:
The AD has a size limit of 1000 elements per request. I cannot change the size limit. PHP does not seem to support paging (I'm using version 5.2 and there's no way of updating the production server.)
I've so far encountered two possible solutions:
Sort the entries by objectSid and use filters to get all the objects. Sample Code
I don't like that for several reasons:
It seems unpredictable to mess with the objectSid, as you have to take it apart, convert it to decimal, convert it back ...
I don't see how you can compare these id's.
(I've tried: '&((objectClass=user)(objectSid>=0))')
Filter after the first letters of the object names (as suggested here):
That's not an optimal solution as many of the users/groups in our system are prefixed with the same few letters.
So my question:
What approach is best used here?
If it's the first one, how can I be sure to handle the objectSid correctly?
Any other possibilities?
Am I missing something obvious?
Update:
- This related question provides information about why the Simple Paged Results extension does not work.
- The web server is running on a Linux server, so COM objects/adoDB are not an option.
I was able to get around the size limitation using ldap_control_paged_result
ldap_control_paged_result is used to Enable LDAP pagination by sending the pagination control. The below function worked perfectly in my case. This would work for (PHP 5 >= 5.4.0, PHP 7)
function retrieves_users($conn)
{
$dn = 'ou=,dc=,dc=';
$filter = "(&(objectClass=user)(objectCategory=person)(sn=*))";
$justthese = array();
// enable pagination with a page size of 100.
$pageSize = 100;
$cookie = '';
do {
ldap_control_paged_result($conn, $pageSize, true, $cookie);
$result = ldap_search($conn, $dn, $filter, $justthese);
$entries = ldap_get_entries($conn, $result);
if(!empty($entries)){
for ($i = 0; $i < $entries["count"]; $i++) {
$data['usersLdap'][] = array(
'name' => $entries[$i]["cn"][0],
'username' => $entries[$i]["userprincipalname"][0]
);
}
}
ldap_control_paged_result_response($conn, $result, $cookie);
} while($cookie !== null && $cookie != '');
return $data;
}
If you have successfully updated your server by now, then the function above can get all the entries. I am using this function to get all users in our AD.
As I've not found any clean solutions I decided to go with the first approach: Filtering By Object-Sids.
This workaround has it's limitations:
It only works for objects with an objectsid, i.e Users and Groups.
It assumes that all Users/Groups are created by the same authority.
It assumes that there are not more missing relative SIDs than the size limit.
The idea is it to first read all possible objects and pick out the one with the lowest relative SID. The relative SID is the last chunk in the SID:
S-1-5-21-3188256696-111411151-3922474875-1158
Let's assume this is the lowest relative SID in a search that only returned 'Partial Search Results'.
Let's further assume the size limit is 1000.
The program then does the following:
It searches all Objects with the SIDs between
S-1-5-21-3188256696-111411151-3922474875-1158
and
S-1-5-21-3188256696-111411151-3922474875-0159
then all between
S-1-5-21-3188256696-111411151-3922474875-1158
and
S-1-5-21-3188256696-111411151-3922474875-2157
and so on until one of the searches returns zero objects.
There are several problems with this approach, but it's sufficient for my purposes.
The Code:
$filter = '(objectClass=Group)';
$attributes = array('objectsid','cn'); //objectsid needs to be set
$result = array();
$maxPageSize = 1000;
$searchStep = $maxPageSize-1;
$adResult = #$adConn->search($filter,$attributes); //Supress warning for first query (because it exceeds the size limit)
//Read smallest RID from the resultset
$minGroupRID = '';
for($i=0;$i<$adResult['count'];$i++){
$groupRID = unpack('V',substr($adResult[$i]['objectsid'][0],24));
if($minGroupRID == '' || $minGroupRID>$groupRID[1]){
$minGroupRID = $groupRID[1];
}
}
$sidPrefix = substr($adResult[$i-1]['objectsid'][0],0,24); //Read last objectsid and cut off the prefix
$nextStepGroupRID = $minGroupRID;
do{ //Search for all objects with a lower objectsid than minGroupRID
$adResult = $adConn->search('(&'.$filter.'(objectsid<='.preg_replace('/../','\\\\$0',bin2hex($sidPrefix.pack('V',$nextStepGroupRID))).')(objectsid>='.preg_replace('/../','\\\\$0',bin2hex($sidPrefix.pack('V',$nextStepGroupRID-$searchStep))).'))', $attributes);
for($i=0;$i<$adResult['count'];$i++){
$RID = unpack('V',substr($adResult[$i]['objectsid'][0],24)); //Extract the relative SID from the SID
$RIDs[] = $RID[1];
$resultSet = array();
foreach($attributes as $attribute){
$resultSet[$attribute] = $adResult[$i][$attribute][0];
}
$result[$RID[1]] = $resultSet;
}
$nextStepGroupRID = $nextStepGroupRID-$searchStep;
}while($adResult['count']>1);
$nextStepGroupRID = $minGroupRID;
do{ //Search for all object with a higher objectsid than minGroupRID
$adResult = $adConn->search('(&'.$filter.'(objectsid>='.preg_replace('/../','\\\\$0',bin2hex($sidPrefix.pack('V',$nextStepGroupRID))).')(objectsid<='.preg_replace('/../','\\\\$0',bin2hex($sidPrefix.pack('V',$nextStepGroupRID+$searchStep))).'))', $attributes);
for($i=0;$i<$adResult['count'];$i++){
$RID = unpack('V',substr($adResult[$i]['objectsid'][0],24)); //Extract the relative SID from the SID
$RIDs[] = $RID[1];
$resultSet = array();
foreach($attributes as $attribute){
$resultSet[$attribute] = $adResult[$i][$attribute][0];
}
$result[$RID[1]] = $resultSet;
}
$nextStepGroupRID = $nextStepGroupRID+$searchStep;
}while($adResult['count']>1);
var_dump($result);
The $adConn->search method looks like this:
function search($filter, $attributes = false, $base_dn = null) {
if(!isset($base_dn)){
$base_dn = $this->baseDN;
}
$entries = false;
if (is_string($filter) && $this->bind) {
if (is_array($attributes)) {
$search = ldap_search($this->resource, $base_dn, $filter, $attributes);
} else {
$search = ldap_search($this->resource, $base_dn, $filter);
}
if ($search !== false) {
$entries = ldap_get_entries($this->resource, $search);
}
}
return $entries;
}
Never make assumptions about servers or server configuration, this leads to brittle code and unexpected, sometimes spectacular failures. Just because it is AD today does not mean it will be tomorrow, or that Microsoft will not change the default limit in the server. I recently dealt with a situation where client code was written with the tribal knowledge that the size limit was 2000, and when administrators, for reasons of their own, changed the size limit, the client code failed horribly.
Are you sure that PHP does not support request controls (the simple paged result extension is a request control)? I wrote an article about "LDAP: Simple Paged Results", and though the article sample code is Java, the concepts are important, not the language. See also "LDAP: Programming Practices".
The previous script error may occur when the distance between the nearest SIDs 999 more.
Example:
S-1-5-21-3188256696-111411151-3922474875-1158
S-1-5-21-3188256696-111411151-3922474875-3359
3359-1158 > 999
in order to avoid this you need to use rigid structures
Example:
$tt = '1';
do {
...
$nextStepGroupRID = $nextStepGroupRID - $searchStep;
$tt++;
} while ($tt < '30');
In this example, we are forced to check 999 * 30 * 2 = 59940 values.

Categories