memory problems mysql - php

I'm having problems updating a mysql database.
I want to run this script on a ~800,000 row database but my memory runs out after 5 rows so some how I need to automatically update this mysql query by adding +5 to the LIMIT.
I want to use a redirect or reset the memory somehow..
<?php
include(dirname(__FILE__) .'/include/simple_html_dom.php');
mysql_connect('localhost', '', '');
mysql_select_db('');
$result = mysql_query("SELECT gamertag FROM gamertags LIMIT 0, 5 ");
while ($row = mysql_fetch_assoc($result)) {
$gamertag = $row['gamertag'];
//pages to pull stats from
$site_url = 'http://www.bungie.net';
$stats_page = 'http://www.bungie.net/stats/reach/default.aspx?player=';
//create dom
$html = new simple_html_dom();
$html->load_file($stats_page . urlencode(strtolower($gamertag)));
//pull nameplate emblem url, ****if it exist****
$nameplate_emblem_el = $html->find("#ctl00_mainContent_identityBar_namePlateImg");
if (!$nameplate_emblem_el){
$nameplate_emblem = 'No nameplate emblem!';
}
else{
//only if #ctl00_mainContent_identityBar_namePlateImg is found
$nameplate_emblem = htmlspecialchars_decode($nameplate_emblem_el[0]->attr['src']);
$nameplate_emblem = $site_url . $nameplate_emblem;
}
mysql_query("INSERT IGNORE INTO star SET gamertag = '".$gamertag."',nameplate = '".$nameplate_emblem."'");
}
?>

800.000 is a large number, not sure if it´s gonna happen, but from a comments on php manual:
"unset() does just what it's name says - unset a variable. It does not force immediate memory freeing. PHP's garbage collector will do it when it see fits - by intention as soon, as those CPU cycles aren't needed anyway, or as late as before the script would run out of memory, whatever occurs first.
If you are doing $whatever = null; then you are rewriting variable's data. You might get memory freed / shrunk faster, but it may steal CPU cycles from the code that truly needs them sooner, resulting in a longer overall execution time."
source:http://php.net/manual/en/function.unset.php
So, try to unset/set to null the variables after each time the loop runs.

I'm guessing that you hit your php memory limit, and not the machines. If so, you can edit php.ini (typically /etc/php5/apache2/php.ini) to give more memory to php. Look for "memory_limit = X" where X is something like 256M

Related

BigQuery PHP API - large query result memory bloat - even with paging

I am running a range of queries in BigQuery and exporting them to CSV via PHP. There are reasons why this is the easiest method for me to do this (multiple queries dependent on variables within an app).
I am struggling with memory issues when the result set is larger than 100mb. It appears that the memory usage of my code seems to grow in line with the result set, which I thought would be avoided by paging. Here is my code:
$query = $bq->query($myQuery);
$queryResults = $bq->runQuery($query,['maxResults'=>5000]);
$FH = fopen($storagepath, 'w');
$rows = $queryResults->rows();
foreach ($rows as $row) {
fputcsv($FH, $row);
}
fclose($FH);
The $queryResults->rows() function returns a Google Iterator which uses paging to scroll through the results, so I do not understand why memory usage grows as the script runs.
Am I missing a way to discard previous pages from memory as I page through the results?
UPDATE
I have noticed that actually since upgrading to the v1.4.3 BigQuery PHP API, the memory usage does cap out at 120mb for this process, even when the result set reaches far beyond this (currently processing a 1gb result set). But still, 120mb seems too much. How can I identify and fix where this memory is being used?
UPDATE 2
This 120mb seems to be tied at 24kb per maxResult in the page. E.g. adding 1000 rows to maxResults adds 24mb of memory. So my question is now why is 1 row of data using 24kb in the Google Iterator? Is there a way to reduce this? The data itself is < 1kb per row.
Answering my own question
The extra memory is used by a load of PHP type mapping and other data structure info that comes alongside the data from BigQuery. Unfortunately I couldn't find a way to reduce the memory usage below around 24kb per row multiplied by the page size. If someone finds a way to reduce the bloat that comes along with the data please post below.
However thanks to one of the comments I realized you can extract a query directly to CSV in a Google Cloud Storage Bucket. This is really easy:
query = $bq->query($myQuery);
$queryResults = $bq->runQuery($query);
$qJobInfo = $queryResults->job()->info();
$dataset = $bq->dataset($qJobInfo['configuration']['query']['destinationTable']['datasetId']);
$table = $dataset->table($qJobInfo['configuration']['query']['destinationTable']['tableId']);
$extractJob = $table->extract('gs://mybucket/'.$filename.'.csv');
$table->runJob($extractJob);
However this still didn't solve my issue as my result set was over 1gb, so I had to make use of the data sharding function by adding a wildcard.
$extractJob = $table->extract('gs://mybucket/'.$filename.'*.csv');
This created ~100 shards in the bucket. These need to be recomposed using gsutil compose <shard filenames> <final filename>. However, gsutil only lets you compose 32 files at a time. Given I will have variable numbers of shards, opten above 32, I had to write some code to clean them up.
//Save above job as variable
$eJob = $table->runJob($extractJob);
$eJobInfo = $eJob->info();
//This bit of info from the job tells you how many shards were created
$eJobFiles = $eJobInfo['statistics']['extract']['destinationUriFileCounts'][0];
$composedFiles = 0; $composeLength = 0; $subfile = 0; $fileString = "";
while (($composedFiles < $eJobFiles) && ($eJobFiles>1)) {
while (($composeLength < 32) && ($composedFiles < $eJobFiles)) {
// gsutil creates shards with a 12 digit number after the filename, so build a string of 32 such filenames at a time
$fileString .= "gs://bucket/$filename" . str_pad($composedFiles,12,"0",STR_PAD_LEFT) . ".csv ";
$composedFiles++;
$composeLength++;
}
$composeLength = 0;
// Compose a batch of 32 into a subfile
system("gsutil compose $fileString gs://bucket/".$filename."-".$subfile.".csv");
$subfile++;
$fileString="";
}
if ($eJobFiles > 1) {
//Compose all the subfiles
system('gsutil compose gs://bucket/'.$filename.'-* gs://fm-sparkbeyond/YouTube_1_0/' . $filepath . '.gz') ==$
}
Note in order to give my Apache user access to gsutil I had to allow the user to create a .config directory in the web root. Ideally you would use the gsutil PHP library, but I didn't want the code bloat.
If anyone has a better answer please post it
Is there a way to get smaller output from the BigQuery library than 24kb per row?
Is there a more efficient way to clean up variable numbers of shards?

Does file_get_contents use cache? If it does how do I prevent?

With this code :
for($i = 1; $i <= $times; $i++){
$milliseconds = round(microtime(true) * 1000);
$res = file_get_contents($url);
$milliseconds2 = round(microtime(true) * 1000);
$milisecondsCount = $milliseconds2 - $milliseconds;
echo 'miliseconds=' . $milisecondsCount . ' ms' . "\n";
}
I get this output :
miliseconds=1048 ms
miliseconds=169 ms
miliseconds=229 ms
miliseconds=209 ms
....
But with sleep:
for($i = 1; $i <= $times; $i++){
$milliseconds = round(microtime(true) * 1000);
$res = file_get_contents($url);
$milliseconds2 = round(microtime(true) * 1000);
$milisecondsCount = $milliseconds2 - $milliseconds;
echo 'miliseconds=' . $milisecondsCount . ' ms' . "\n";
sleep(2);
}
This :
miliseconds=1172 ms
miliseconds=1157 ms
miliseconds=1638 ms
....
So what is happening here ?
My questions:
1) Why don't you test this yourself by using clearstatcache and checking the time signatures with and without using it?
2) Your method of testing is terrible, as a first fix - have you tried swapping so that the "sleep" reading function plays first rather than second?
3) How many iterations of your test have you done?If it's less than 10,000 then I suggest you repeat your tests to be sure to identify firstly the average delay (or lack thereof) and then what makes you think that this is caused specifically by caching?
4) What are the specs. of your machine, your RAM and free and available memory upon each iteration?
5) Is your server live? Are you able to remove outside services as possible causes of time disparity? (such as anti-virus, background processes loading, server traffic, etc. etc.)?
My Answer:
Short answer, is NO. file_get_contents does not use the cache.
However, operating system level caches may cache recently used files to
make for quicker access, but I wouldn't count on those being
available.
Calling a file reference multiple times will only read the file
from disk a single time, subsequent calls will read the value from the
static variable within the function. Note that you shouldn't count on
this approach for large files or ones used only once per page, since
the memory used by the static variable will persist until the end of
the request. Keeping the contents of files unnecessarily in static
variables is a good way to make your script a memory hog.
Quoted from This answer.
For remote (non local filesystem) files, there are so many possible causes of variable delays that really file_get_contents caching is a long way down the list of options.
As you claim to be connecting using a localhost/ reference structure, I would hazard (but not certain) that your server will be using various firewall and checking techniques to check the incoming request which will add a large variable to the time taken.
Concatenate a random query string with url.
For eg: $url = 'http://example.com/file.html?r=' . rand(0, 9999);

PHP Allowed memory size Memory size exhausted

I am trying to display users on a map using google API. Now that when users count increase to 12000 I got an memory exception error. For the time being I increased the memory to 256 from 128. But I am sure when its 25000 users again the same error will come.
The error is,
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 30720 bytes) in /var/www/html/digitalebox_v2_20150904_100/protected/extensions/bootstrap/widgets/TbBaseMenu.phponline 193
Fatal error: Class declarations may not be nestedin/var/www/html/yii-1.1.14.f0fee9/framework/collections/CListIterator.phponline 20`
My code,
$criteria = new CDbCriteria();
$criteria->condition = 't.date BETWEEN :from_date AND :to_date';
$modelUsers = User::model()->findAll($criteria);
foreach ($modelUsers as $modelUser) {
$fullAddress = $modelUser->address1 . ', ' . $modelUser->zip . ' ' . $modelUser->city . ', ' . Country::model()->findByPk($modelUser->countryCode)->countryName;
}
when this $modelUsers has 12000 records this memory problem comes as its 12000 user objects.
what should i do to prevent this kind of issues in future ? what is the minimum required memory size for a php application to run ?
When you call findAll it loads all records in time, so you get end memory error. Speacialy for that situations Yii has CDataProviderIterator. It allows iteration over large data sets without holding the entire set in memory.
$dataProvider = new CActiveDataProvider("User");
$iterator = new CDataProviderIterator($dataProvider);
foreach($iterator as $user) {
$fullAddress = $modelUser->address1 . ', ' . $modelUser->zip . ' ' . $modelUser->city . ', ' . Country::model()->findByPk($modelUser->countryCode)->countryName;
}
I'd solve this problem in a completely different way than suggested. I'd ditch the whole model concept and I'd query MySQL for the addresses. You can query MySQL so it returns already concatenated address, which means you can avoid concatenating it in PHP - that avoids wasting memory.
Next step would be using PDO and issuing an unbuffered query. This means that PHP process will not store entire 12 000 records in its memory - that's how you avoid exhausting the memory.
Final step is outputting the result - as you loop through the unbuffered query, you can use output a single row (or 10 rows) at a time.
What happens this way is that you trade CPU for RAM. Since you don't know how much RAM you will need, the best approach is to use as little as possible.
Using unbuffered queries and flushing PHP's output buffer seems like the only viable way to go for your use case, seeing you can't avoid outputting a lot of data.
Perhaps increase the memory a bit more and see if that corrects the issue. Try setting it to 512M or 1024M. I will tell you from my own experience, if you are trying to load google map markers that number of markers will probably crash the map.
Increase the value of memory_limit in php.ini. Then restart Apache or whatever PHP is running under. Keep in mind that what you add here needs to be taken away from other programs running on the same machine (such MySQL and its innodb_buffer_pool_size).

How to overwrite php memory for security reason?

I am actually working on a security script and it seems that I meet a problem with PHP and the way PHP uses memory.
my.php:
<?php
// Display current PID
echo 'pid= ', posix_getpid(), PHP_EOL;
// The user type a very secret key
echo 'Fill secret: ';
$my_secret_key = trim(fgets(STDIN));
// 'Destroty' the secret key
unset($my_secret_key);
// Wait for something
echo 'waiting...';
sleep(60);
And now I run the script:
php my.php
pid= 1402
Fill secret: AZERTY <= User input
waiting...
Before the script end (while sleeping), I generate a core file sending SIGSEV signal to the script
kill -11 1402
I inspect the corefile:
strings core | less
Here is an extract of the result:
...
fjssdd
sleep
STDIN
AZERTY <==== this is the secret key
zergdf
...
I understand that the memory is just released with the unset and not 'destroyed'. The data are not really removed (a call to the free() function)
So if someone dumps the memory of the process, even after the script execution, he could read $my_secret_key (until the memory space will be overwritten by another process)
Is there a way to overwrite this memory segment of the full memory space after the PHP script execution?
Thanks to all for your comments.
I already now how memory is managed by the system.
Even if PHP doesn't use malloc and free (but some edited versions like emalloc or efree), it seems (and I understand why) it is simply impossible for PHP to 'trash' after freeing disallowed memory.
The question was more by curiosity, and every comments seems to confirm what I previously intend to do: write a little piece of code in a memory aware language (c?) to handle this special part by allocating a simple string with malloc, overwriting with XXXXXX after using THEN freeing.
Thanks to all
J
You seem to be lacking a lot of understanding about how memory management works in general, and specifically within PHP.
A discussion of the various salient points is redundant when you consider what the security risk is here:
So if someone dumps the memory of the process, even after the script execution
If someone can access the memory of a program running under a different uid then they have root access and can compromise the target in so many other ways - and it doesn't matter if it's PHP script, ssh, an Oracle DBMS....
If someone can access the memory previously occupied by a process which has now terminated, then not only have they got root, they've already compromised the kernel.
You seem to have missed an important lesson in what computers mean by "delete operations".
See, it's never feasible for computer to zero-out memory, but instead they just "forget" they were using that memory.
In other words, if you want to clear memory, you most definitely need to overwrite it, just as #hakre suggested.
That said, I hardly see the point of your script. PHP just isn't made for the sort of thing you are doing. You're probably better off with a small dedicated solution rather than using PHP. But this is just my opinion. I think.
I dunno if that works, but if you can in your tests, please add these lines to see the outcome:
...
// Overwrite it:
echo 'Overwrite secret: ';
for($l = strlen($my_secret_key), $i = 0; $i < $l; $i++)
{
$my_secret_key[$i] = '#';
}
And I wonder whether or not running
gc_collect_cycles();
makes a difference. Even the values are free'ed, they might still be in memory (of the scripts pid or even somewhere else in memory space).
I would try whether overwriting memory with some data would eventually erase your original locations of variables:
$buffer = '';
for ($i = 0; $i < 1e6; $i++) {
$buffer .= "\x00";
}
As soon as php releases the memory, I suppose more allocations might be given the same location. It's hardly fail proof though.

memory_get_usage

I'm making a little benchmark class to display page load time and memory usage.
Load time is already working, but when I display the memory usage, it doesn't change
Example:
$conns = array();
ob_start();
benchmark::start();
$conns[] = mysql_connect('localhost', 'root', '');
benchmark::stop();
ob_flush();
uses the same memory as
$conns = array();
ob_start();
benchmark::start();
for($i = 0; $i < 1000; $i++)
{
$conns[] = mysql_connect('localhost', 'root', '');
}
benchmark::stop();
ob_flush();
I'm using memory_get_usage(true) to get the memory usage in bytes.
memory_get_usage(true) will show the amount of memory allocated by the php engine, not actually used by the script. It's very possible that your test script hasn't required the engine to ask for more memory.
For a test, grab a large(ish) file and read it into memory. You should see a change then.
I've successfully used memory_get_usage(true) to track the memory usage of web crawling scripts, and it's worked fine (since the goal was to slow things down before hitting the system memory limit). The one thing to remember is that it doesn't change based on actual usage, it changes based on the memory requested by the engine. So what you end up seeing is sudden jumps instead of slowing growing (or shrinking).
If you set the real_usage flag to false, you may be able to see very small memory changes - however, this won't help you monitor the true amount of memory php is requesting from the system.
(Update: To be clear the difference I describe is between memory used by the variables of your script, compared to the memory the engine requested to run your script. All the same script, different way of measuring.)
I'm no Guru in PHP's internals, but I could imagine an echo does not affect the amount of memory used by PHP, as it just outputs something to the client.
It could be different if you enable output buffering.
The following should make a difference:
$result = null;
benchmark::start()
for($i = 0; $i < 10000; $i++)
{
$result.='test';
}
Look at:
for($i = 0; $i < 1000; $i++)
{
$conns[] = mysql_connect('localhost', 'root', '');
}
You could have looped to 100,000 and nothing would have changed, its the same connection. No resources are allocated for it because the linked list remembering them never grew. Why would it grow? There's already (assumingly) a valid handle at $conns[0]. It won't make a difference in memory_get_usage(). You did test $conns[15] to see if it worked, yes?
Can root#localhost have multiple passwords? No. Why would PHP bother to handle another connection just because you told it to? (tongue in cheek).
I suggest running the same thing via CLI through Valgrind to see the actual heap usage:
valgrind /usr/bin/php -f foo.php .. or something similar. At the bottom, you'll see what was allocated, what was freed and garbage collection at work.
Disclaimer: I do know my way around PHP internals, but I am no expert in that deliberately obfuscated maze written in C that Zend calls PHP.
echo won't change the allocated number of bytes (unless you use output buffers).
the $i-variable is being unset after the for-loop, so it's not changing the number of allocated bytes either.
try to use a output buffering example:
ob_start();
benchmark::start();
for($i = 0; $i < 10000; $i++)
{
echo 'test';
}
benchmark::stop();
ob_flush();

Categories