Nested loops in PHP extremely slow - php

I have 6 nested loops in a PHP program, however, the calculation time for the script is extremely slow. I would like to ask if there is a better way of implementing the 6 loops and increasing computation time, even if it means switching to another language. The nature of the algorithm I'm implementing requires iteration, so I don't know how I can better implement it.
Here's the code.
<?php
$time1 = microtime(true);
$res = 16;
$imageres = 128;
for($x=0;$x<$imageres;++$x){
for($y=0;$y<$imageres;++$y){
$pixels[$x][$y]=1;
}};
$quantizermatrix = 1;
$scalingcoefficient = 1/($res/2);
for($currentimagex=0;$currentimagex<($res*($imageres/$res-1)+1);$currentimagex = $currentimagex +$res){
for($currentimagey=0;$currentimagey<($res*($imageres/$res-1)+1);$currentimagey = $currentimagey +$res){
for($u=0;$u<$res;++$u){
for($v=0;$v<$res;++$v){
for($x=0;$x<$res;++$x){
for($y=0;$y<$res;++$y){
if($u == 0) {$a = 1/(sqrt(2));} else{$a = 1;};
if($v == 0){$b = 1/(sqrt(2));}else{$b = 1;};
$xes[$y] = $pixels[$x+$currentimagex][$y+$currentimagey]*cos((M_PI/$res)*($x+0.5)*$u)*cos( M_PI/$res*($y+0.5)*$v);
}
$xes1[$x] = array_sum($xes);
}
$xes2= array_sum($xes1)*$scalingcoefficient*$a*$b;
$dctarray[$u+$currentimagex][$v+$currentimagey] = round($xes2/$quantizermatrix)*$quantizermatrix;
}}}};
foreach($dctarray as $dct){
foreach($dct as $dc){
echo $dc." ";
}
echo "<br>";}
$time2 = microtime(true);echo 'script execution time: ' . ($time2 - $time1);
?>
I've removed a large portion of the code that's irrelevant, since this is the section of the code that's problematic
Essentially the code iterates through every pixel in a PNG image and outputs a computed matrix (2d array). This code takes around 2 seconds for a 128x128 image. This makes this program impractical for normal images greater than 128x128

There is a function available in Imagick library
Imagick::exportImagePixels
Refer the below link it might help you out
http://www.php.net/manual/en/imagick.exportimagepixels.php

Related

2D PHP array, join values based on similar values

I have PHP array which I use to draw a graph
Json format:
{"y":24.1,"x":"2017-12-04 11:21:25"},
{"y":24.1,"x":"2017-12-04 11:32:25"},
{"y":24.3,"x":"2017-12-04 11:33:30"},
{"y":24.1,"x":"2017-12-04 11:34:25"},
{"y":24.2,"x":"2017-12-04 11:35:35"},.........
{"y":26.2,"x":"2017-12-04 11:36:35"}, ->goes up for about a minute
{"y":26.3,"x":"2017-12-04 11:37:35"},.........
{"y":24.1,"x":"2017-12-04 11:38:25"},
{"y":24.3,"x":"2017-12-04 11:39:30"}
y=is temperature and x value is date time,
as you can see temperature doesn't change so often even if, it change only for max 0.4. But sometimes after a long period of similar values it change for more than 0.4.
I would like to join those similar values, so graph would not have 200k of similar values but only those that are "important".
I would need an advice, how to make or which algorithm would be perfect to create optimized array like i would like.
perfect output:
{"y":24.1,"x":"2017-12-04 11:21:25"},.........
{"y":24.1,"x":"2017-12-04 11:34:25"},
{"y":24.2,"x":"2017-12-04 11:35:35"},.........
{"y":26.2,"x":"2017-12-04 11:36:35"}, ->goes up for about a minute
{"y":26.3,"x":"2017-12-04 11:37:35"},.........
{"y":24.1,"x":"2017-12-04 11:38:25"}
Any help?
As you specified php I'm going to assume you can handle this on the output side.
Basically, you want logic like "if the absolute value of the temperature exceeds the last temperature by so much, or the time is greater than the last time by x minutes, then let's output a point on the graph". If that's the case you can get the result by the following:
$temps = array(); //your data in the question
$temp = 0;
$time = 0;
$time_max = 120; //two minutes
$temp_important = .4; //max you'll tolerate
$output = [];
foreach($temps as $point){
if(strtotime($point['x']) - $time > $time_max || abs($point['y'] - $temp) >= $temp_important){
// add it to output
$output[] = $point;
}
//update our data points
if(strtotime($point['x']) - $time > $time_max){
$time = strtotime($point['x']);
}
if(abs($point['y'] - $temp) >= $temp_important){
$temp = $point['y'];
}
}
// and out we go..
echo json_encode($output);
Hmm, that's not exactly what you're asking for, as if the temp spiked in a short time and then went down immediately, you'd need to change your logic - but think of it in terms of requirements.
If you're RECEIVING data on the output side I'd write something in javascript to store these points in/out and use the same logic. You might need to buffer 2-3 points to make your decision. Your logic here is performing an important task so you'd want to encapsulate it and make sure you could specify the parameters easily.

PHP variable name and SQL table column name length

I have this really newbie question :)
Despite the the fact that
$lastInvoiceNumber
$lastInvNum
or:
last_invoice_number (int 10)
last_inv_num (int 10)
Save a bit of time to write. Do they have any benefits (even the slightest)
performance-wise?
Long vs short?
Is there any chance php and MySQL more importantly will consume
less memory if the query had a shorter table column name?
For example if I have to fetch 500 rows on a single query I imagine
the query would run 500 times and running
last_invoice_number 500 times
vs running
last_inv_num can save some memory or make things slightly faster.
Thanks.
No, there is really no noticeable difference in performance whatsoever, and you'll gain a huge improvement in readability by using descriptive variable names. Internally, these variables are referred to by memory addresses (to put it simply), not by their ASCII/Unicode names. The impact it may have on performance, in nearly any language, is so infinitesimal that it would never be noticed.
Edit:
I've added a benchmark. It shows that there is really no difference at all between using a single letter as a variable name and using a 17-character variable name. The single letter might even be a tiny bit slower. However, I do notice a slight consistent increase in time when using a 90-character variable name, but again, the difference is too small to ever notice for practical purposes. Here's the benchmark and output:
<?php
# To prevent any startup-costs from skewing results of the first test.
$start = microtime(true);
for ($i = 0; $i<1000; $i++)
{
$noop = null;
}
$end = microtime(true);
# Let's benchmark!
$start = microtime(true);
for ($i = 0; $i<1000000; $i++)
{
$thisIsAReallyLongAndReallyDescriptiveVariableNameInFactItIsJustWayTooLongHonestlyWtf = mt_rand(0, 1000);
}
$end = microtime(true);
printf("Using a long name took %f seconds.\n", ($end - $start));
$start = microtime(true);
for ($i = 0; $i<1000000; $i++)
{
$thisIsABitTooLong = mt_rand(0, 1000);
}
$end = microtime(true);
printf("Using a medium name took %f seconds.\n", ($end - $start));
$start = microtime(true);
for ($i = 0; $i<1000000; $i++)
{
$t = mt_rand(0, 1000);
}
$end = microtime(true);
printf("Using a short name took %f seconds.\n", ($end - $start));
Output:
$ php so-test.php
Using a long name took 0.148200 seconds.
Using a medium name took 0.142286 seconds.
Using a short name took 0.145952 seconds.
The same should be true for MySQL as well; I would almost guarantee it, but it's not as easy to benchmark. With MySQL, you will have far more overhead from the network and IO than anything to do with symbol naming in the code. Just as with PHP, internally, column names aren't just strings that are iterated over; data is stored in memory-efficient formats.

php: find duplicate content in files/nested looping

I have scraped 5000 files, stored them in individual files (0-4999.txt), now i need to find duplicate content in them. so i am comparing each file with one another in nested loop (ETA 82 hours). This approach will definitely take hours to complete. My main concern here is the no. of iterations. Can anyone suggest a better approach to cut down iterations and reduce time taken?
current code: NCD algorithm
function ncd_new($sx, $sy, $prec=0, $MAXLEN=9000) {
# NCD with gzip artifact correctoin and percentual return.
# sx,sy = strings to compare.
# Use $prec=-1 for result range [0-1], $pres=0 for percentual,
# For NCD definition see http://arxiv.org/abs/0809.2553
$x = $min = strlen(gzcompress($sx));
$y = $max = strlen(gzcompress($sy));
$xy= strlen(gzcompress($sx.$sy));
$a = $sx;
if ($x>$y) { # swap min/max
$min = $y;
$max = $x;
$a = $sy;
}
$res = ($xy-$min)/$max; # NCD definition.
if ($MAXLEN<0 || $xy<$MAXLEN) {
$aa= strlen(gzcompress($a.$a));
$ref = ($aa-$min)/$min;
$res = $res - $ref; # correction
}
return ($prec<0)? $res: 100*round($res,2+$prec);
}
looping over each file:
$totalScraped = 5000;
for($fileC=0;$fileC<$totalScraped;$fileC++)
{
$f1 = file_get_contents($fileC.".txt");
$stripstr = array('/\bis\b/i', '/\bwas\b/i', '/\bthe\b/i', '/\ba\b/i');
$file1 = preg_replace($stripstr, '', $f1);
// 0+fileC => exclude already compared files
// eg. if fileC=10 , start loop 11 to 4999
for($fileD=(0+$fileC);$fileD<$totalScraped;$fileD++)
{
$f2 = file_get_contents($fileD.".txt", FILE_USE_INCLUDE_PATH);
$stripstr = array('/\bis\b/i', '/\bwas\b/i', '/\bthe\b/i', '/\ba\b/i');
$file2 = preg_replace($stripstr, '', $f2);
$total=ncd_new($file1,$file2);
echo "$fileName1 vs $fileName2 is: $total%\n";
}
}
You may want to find a way to distinguish likely candidates from unlikely ones.
So, maybe there is a way that you can compute a value for each file (say: a word count, a count of sentences / paragraphs... maybe even a count of individual letters), to identify the unlikely candidates beforehand.
If you could achieve this, you could reduce the amount of comparisons by ordering your arrays by this computed number.
another process that i tried was:
strip html tags from page
replace \s{2,} with \s, \n{2,} with \n, so that text b/w each tag is presented in a single line(almost)
compare two such generated files by taking a line, preg_matching, if found -> duplicate, else break line into array of words, calculate array_intersect, if count is 70% or more of line length -> duplicate.
which was very efficient and i could compare 5000 files in ~10 minutes
but still slow for my requirements.
So i implemented the first logic "ncd algo" method in C language, and it completes the task with 5-10 seconds (depending on the average page size)

Random number between x and y excluding a range of numbers inbetween

I am implementing a system at the moment where it needs to allocate a number in a certain range to a person, but not use any number that has been used before.
Keep in mind, both the number range and exclusion list are both going to be quite large.
Initially, I thought doing something like this would be best:
<?php
$start = 1;
$end = 199999;
$excluded = array(4,6,7,8,9,34);
$found = FALSE;
while (!$found) {
$rand = mt_rand($start,$end);
if (!in_array($rand,$excluded)) {
$found = TRUE;
}
}
?>
But I don't think this is ideal, there is the possibility of an infinite loop (or it taking a very long time / timing out the script).
I also thought about generating an array of all the numbers I needed, but surely a massive array would be worse? Also doing an array diff on 2 massive arrays would surely take a long time too?
Something like this:
<?php
$start = 1;
$end = 199999;
$allnums = range($start,$end);
$excluded = array(4,6,7,8,9,34);
$searcharray = array_diff($allnums,$excluded);
$rand = array_rand($searcharray);
?>
So, my question would be which would be a better option? And is there another (better) way of doing this that someone has used before?
Array's holding large amounts of data will use up a lot of memory, can you not use a database to hold these numbers in? That's generally what they are designed for.

Problem reading files greater than 1GB with XMLReader

Is there a maximum file size the XMLReader can handle?
I'm trying to process an XML feed about 3GB large. There are certainly no PHP errors as the script runs fine and successfully loads to the database after it's been run.
The script also runs fine with smaller test feeds - 1GB and below. However, when processing larger feeds the script stops reading the XML File after about 1GB and continues running the rest of the script.
Has anybody experienced a similar problem? and if so how did you work around it?
Thanks in advance.
I had same kind of problem recently and I thought to share my experience.
It seems that problem is in the way PHP was compiled, whether it was compiled with support for 64bit file sizes/offsets or only with 32bit.
With 32bits you can only address 4GB of data. You can find a bit confusing but good explanation here: http://blog.mayflower.de/archives/131-Handling-large-files-without-PHP.html
I had to split my files with Perl utility xml_split which you can find here: http://search.cpan.org/~mirod/XML-Twig/tools/xml_split/xml_split
I used it to split my huge XML file into manageable chunks. The good thing about the tool is that it splits XML files over whole elements. Unfortunately its not very fast.
I needed to do this one time only and it suited my needs, but I wouldn't recommend it repetitive use. After splitting I used XMLReader on smaller files of about 1GB in size.
Splitting up the file will definitely help. Other things to try...
adjust the memory_limit variable in php.ini. http://php.net/manual/en/ini.core.php
rewrite your parser using SAX -- http://php.net/manual/en/book.xml.php . This is a stream-oriented parser that doesn't need to parse the whole tree. Much more memory-efficient but slightly harder to program.
Depending on your OS, there might also be a 2gb limit on the RAM chunk that you can allocate. Very possible if you're running on a 32-bit OS.
It should be noted that PHP in general has a max file size. PHP does not allow for unsigned integers, or long integers, meaning you're capped at 2^31 (or 2^63 for 64 bit systems) for integers. This is important because PHP uses an integer for the file pointer (your position in the file as you read through), meaning it cannot process a file larger than 2^31 bytes in size.
However, this should be more than 1 gigabyte. I ran into issues with two gigabytes (as expected, since 2^31 is roughly 2 billion).
I've run into a similar issue when parsing large documents. What I wound up doing is breaking the feed into smaller chunks using filesystem functions, then parsing those smaller chunks... So if you have a bunch of <record> tags that you are parsing, parse them out with string functions as a stream, and when you get a full record in the buffer, parse that using the xml functions... It sucks, but it works quite well (and is very memory efficient, since you only have at most 1 record in memory at any one time)...
Do you get any errors with
libxml_use_internal_errors(true);
libxml_clear_errors();
// your parser stuff here....
$r = new XMLReader(...);
// ....
foreach( libxml_get_errors() as $err ) {
printf(". %d %s\n", $err->code, $err->message);
}
when the parser stops prematurely?
Using WindowsXP, NTFS as filesystem and php 5.3.2 there was no problem with this test script
<?php
define('SOURCEPATH', 'd:/test.xml');
if ( 0 ) {
build();
}
else {
echo 'filesize: ', number_format(filesize(SOURCEPATH)), "\n";
timing('read');
}
function timing($fn) {
$start = new DateTime();
echo 'start: ', $start->format('Y-m-d H:i:s'), "\n";
$fn();
$end = new DateTime();
echo 'end: ', $start->format('Y-m-d H:i:s'), "\n";
echo 'diff: ', $end->diff($start)->format('%I:%S'), "\n";
}
function read() {
$cnt = 0;
$r = new XMLReader;
$r->open(SOURCEPATH);
while( $r->read() ) {
if ( XMLReader::ELEMENT === $r->nodeType ) {
if ( 0===++$cnt%500000 ) {
echo '.';
}
}
}
echo "\n#elements: ", $cnt, "\n";
}
function build() {
$fp = fopen(SOURCEPATH, 'wb');
$s = '<catalogue>';
//for($i = 0; $i < 500000; $i++) {
for($i = 0; $i < 60000000; $i++) {
$s .= sprintf('<item>%010d</item>', $i);
if ( 0===$i%100000 ) {
fwrite($fp, $s);
$s = '';
echo $i/100000, ' ';
}
}
$s .= '</catalogue>';
fwrite($fp, $s);
flush($fp);
fclose($fp);
}
output:
filesize: 1,380,000,023
start: 2010-08-07 09:43:31
........................................................................................................................
#elements: 60000001
end: 2010-08-07 09:43:31
diff: 07:31
(as you can see I screwed up the output of the end-time but I don't want to run this script another 7+ minutes ;-))
Does this also work on your system?
As a side-note: The corresponding C# test application took only 41 seconds instead of 7,5 minutes. And my slow harddrive might have been the/one limiting factor in this case.
filesize: 1.380.000.023
start: 2010-08-07 09:55:24
........................................................................................................................
#elements: 60000001
end: 2010-08-07 09:56:05
diff: 00:41
and the source:
using System;
using System.IO;
using System.Xml;
namespace ConsoleApplication1
{
class SOTest
{
delegate void Foo();
const string sourcepath = #"d:\test.xml";
static void timing(Foo bar)
{
DateTime dtStart = DateTime.Now;
System.Console.WriteLine("start: " + dtStart.ToString("yyyy-MM-dd HH:mm:ss"));
bar();
DateTime dtEnd = DateTime.Now;
System.Console.WriteLine("end: " + dtEnd.ToString("yyyy-MM-dd HH:mm:ss"));
TimeSpan s = dtEnd.Subtract(dtStart);
System.Console.WriteLine("diff: {0:00}:{1:00}", s.Minutes, s.Seconds);
}
static void readTest()
{
XmlTextReader reader = new XmlTextReader(sourcepath);
int cnt = 0;
while (reader.Read())
{
if (XmlNodeType.Element == reader.NodeType)
{
if (0 == ++cnt % 500000)
{
System.Console.Write('.');
}
}
}
System.Console.WriteLine("\n#elements: " + cnt + "\n");
}
static void Main()
{
FileInfo f = new FileInfo(sourcepath);
System.Console.WriteLine("filesize: {0:N0}", f.Length);
timing(readTest);
return;
}
}
}

Categories