Doctrine2 has problems with more than 1000 rows at select - php

I have this query which returns me all POIs in the given area:
$query = $em
->createQuery(
'SELECT f FROM MyApplication\MyBundle\Entity\POI p
WHERE (p.latitude BETWEEN :southEastLatitude AND :norhtWestLatitude) AND
(p.longitude BETWEEN :southEastLongitude AND :norhtWestLongitude)
ORDER BY p.name
');
$query->setParameter(":norhtWestLatitude", $northWestLat);
$query->setParameter(":norhtWestLongitude", $northWestLng);
$query->setParameter(":southEastLatitude", $southEastLat);
$query->setParameter(":southEastLongitude", $southEastLng);
If i try to access with a small area (with params with small differences), I successfully get the result. I think I get the result up to 1000 rows... I'm not quite sure.
If i try to access with a bigger area, I get an empty resultset...
Somehow firing the same query with the params of the bigger area, I'm getting the correct resultset (~1060 rows).
So I'm wondering about limitations of doctrine (or even Symfony??? I'm using doctrine inside my Symfony2 Project), are there any? I also tried the $query->setMaxResults(999999999); but it didn't help...
Anybody had the same problem?
Edit: Maybe the php memory usage is to high? I added these lines before and after the getresult:
echo "Memory usage before: " . (memory_get_usage() / 1024) . " KB" . PHP_EOL;
echo "Memory usage after: " . (memory_get_usage() / 1024) . " KB" . PHP_EOL;
The output was:
Memory usage after: 5964.0625 KB
Memory usage after: 10019.421875 KB
Edit:
Strange tests:
1) If I test with these parameters:
WHERE
(f.latitude BETWEEN 45.64273082966722 AND 47.29965978937995) AND
(f.longitude BETWEEN 4.93262593696295 AND 9.99999999999999)
I'm getting 923 rows (normal behaviour)
2) If I change the parameter 9.999999999999 to 10.000000000 (or some number bigger than 9.9999999), I'm getting an empty resultset in my application.
The database still returns 923 rows (for 10.000000000).
EDIT:
I could fix the problem, discussed here: https://groups.google.com/d/msg/doctrine-user/qLSon6m4nM4/y5vLztHcbDgJ

From the Doctrine2 mailing list I learned that you mapped the properties as type "string":
/** #ORM\Column(name="latitude", type="string", length=255, nullable=false) */
private $latitude;
/** #ORM\Column(name="longitude", type="string", length=255, nullable=false) */
private $longitude;
This means that your db will have VARCHAR columns for these properties. So when the db runs your query, it will perform a string-compare.
A string-compare is different from a (normal) number-compare. See these results:
$a = 1234567890;
$b = 987654321;
echo $a == $b ? '0' : ($a < $b ? '-1' : '1'); // output: 1 (means $a is bigger than $b)
echo strcmp( $a, $b ); // output: -8 (means $a is smaller than $b)
So it's probably best to map the properties as something that represents numbers in your db (DECIMAL would be a good choice):
/** #ORM\Column(name="latitude", type="decimal", nullable=false) */
private $latitude;
/** #ORM\Column(name="longitude", type="decimal", nullable=false) */
private $longitude;
PS: You mention you get valid results when you perform the query yourself. This is probably because you did:
... latitude BETWEEN 45.64273082966722 AND 47.29965978937995 ...
But Doctrine (because you mapped the properties as strings) will do:
... latitude BETWEEN '45.64273082966722' AND '47.29965978937995' ...
Note the quotes here. There is a big difference in how these two statements are treated, like my test-case shows ;)

It's probably related to memory usage. Please check your apache log for memory errors.
When selecting large amounts of data it may be helpful to use $query->iterate() instead of $query->getResult(), as pointed out in this article.
To check if your query generates the right SQL, you could log or otherwise output the result of calling $query->getSQL() (after setting the parameters). Feed the resulting SQL to the database client of your choice to see if the desired result comes back.

Related

PHPUnit test - Failed asserting that actual size 11935 matches expected size 3

I'm brand new to phpunit and attempting to write a test that asserts that three notes were created, but I'm getting all of the notes from the DB.
/** #test */
public function it_gets_notes()
{
$address = Address::first();
$notes = factory(AddressNote::class, 3)->create(['address_id'
=> $address->id]);
$found = $this->notesClass->getData(['address_id' => $address-
>id]);
$this->assertCount(3, $found);
}
}
The Address and AddressNote models are working properly. I think I'm most confused about the getData method, which I know I need for my code coverage. Anyone see what I'm missing that would generate the error in the title?
If you need to check the difference after running your create method, then save $found before and after adding them, and the subtraction will be your number:
public function it_gets_notes()
{
$address = Address::first();
$found = $this->notesClass->getData(['address_id' => $address->id]);
$notes = factory(AddressNote::class, 3)->create(['address_id' => $address->id]);
$foundAfter = $this->notesClass->getData(['address_id' => $address->id]);
$difference = count($foundAfter) - count($found);
$this->assertEquals(3, $difference);
}
Note that you need to use assertEquals() with 3 and the difference now instead of assertCount(), since you're comparing numbers.
I don't know the whole your story, but I assume that your first mistake was that you did not create a testing database.
So, that would be the first step - in addition to your databaseName (whatever the name) create a databaseName_test.
There are some other steps to do - in your ,env file change the name of the databaseName to databaseName_testing - but only while you're doing your testing (and then immediately rollback to your original databaseName).
Yet, the problem can still persist (PHP Unit is not a perfect tool, just like PHP), and here is the hack that can help.
Instead of:
$this->assertEquals(3, $difference);
write:
$this->assertEquals(11935, $difference); //the number is specific for your case
Yep, it's stupid, but it should work...

Looking for correct PHP socket error constants

I have an odd problem that's likely due to my inexperience with socket programming.
I'm currently using the raw error code numbers I'm getting from socket_last_error() as I see I need to handle them. This is getting unwieldy.
I'd like to use predefined constants (either my own or builtin) to refer to the different types of socket errors/conditions I need to deal with.
The error tables I've found all use conflicting values.
At http://php.net/manual/en/sockets.constants.php (in a comment that lists the raw numbers), I see things like (excerpted):
SOCKET_EINPROGRESS 10036
SOCKET_EALREADY 10037
SOCKET_ENETUNREACH 10051
The one comment at http://php.net/socket_last_error contains a list of what are apparently the "standard C defines" for socket errors (?):
define('EINPROGRESS', 115); /* Operation now in progress */
define('EALREADY', 114); /* Operation already in progress */
define('ENETUNREACH', 101); /* Network is unreachable */
The errno.h file on my own system (hiding in /usr/include/asm-generic/) seems to support this:
#define EINPROGRESS 115 /* Operation now in progress */
#define EALREADY 114 /* Operation already in progress */
#define ENETUNREACH 101 /* Network is unreachable */
However those "standard definitions" seem to be subject to change depending on what OS you're on: BSD4.4's errno.h has things like
#define EINPROGRESS 36 /* Operation now in progress */
#define EALREADY 37 /* Operation already in progress */
#define ENETUNREACH 51 /* Network is unreachable */
Now we know what the socket_* functions were inspired by though!
Finally, I find what seems to be a hint of an explanation hiding in the VirtualBox source code:
#ifndef EALREADY
# if defined(RT_ERRNO_OS_BSD)
# define EALREADY (37)
# elif defined(RT_OS_LINUX)
# define EALREADY (114)
# else
# define EALREADY (149)
# endif
#endif
With all of this taken into account...
socket_last_error() returns an errno of 101 when Network is unreachable, as opposed to 51 or 10051. So this function appears to be in apparent violation of the socket library's officially-supplied constants, and seems to be using Linux's error codes instead.
([EDIT after adding my answer]: The 101 stated above was obtained on Linux.)
So now that I seem to be in Undocumented and/or Seemingly Undefined Behavior land... what do I do now? I'm on Linux right now; do these values ever change?
Is there some way I can use the offical SOCKET_* constants? I certainly wouldn't mind doing so.
Sounds like you've been researching this hard -- and also that the platform-specific stuff might be a nuisance. Let me suggest this code which will ostensibly fetch all defined constants grouped together as "sockets" constants:
$consts = get_defined_constants(TRUE);
$socket_constants = $consts["sockets"];
foreach($socket_constants as $key => $value){
echo $key . '=' . $value . "\n";
}
From that, I was able to construct this function which you may find useful on any platform for finding the names of socket constants which might match
function get_socket_constant_names($to_check) {
$consts = get_defined_constants(TRUE);
$socket_constants = $consts["sockets"];
$matches = array();
foreach($socket_constants as $key => $value){
if ($value == $to_check) {
$matches[] = $key;
}
}
return $matches;
}
This:
var_dump(get_socket_constant_names(101));
Yields this:
array(1) {
[0] =>
string(18) "SOCKET_ENETUNREACH"
}
I just did some digging.
WARNING. PHP returns OS-specific socket error codes.
I do not know how to retrieve error codes compatible with the socket_* constants, so you have to detect the OS :'(
See proof after source code.
The following are heavily elided to aid focus.
From https://github.com/php/php-src/blob/master/ext/sockets/sockets.c:
PHP_FUNCTION(socket_connect) {
// will set errno
retval = connect(php_sock->bsd_socket, (struct sockaddr *)&sin, sizeof(struct sockaddr_in));
if (retval != 0) {
// call macro defined in php_sockets.h with errno value
PHP_SOCKET_ERROR(php_sock, "unable to connect", errno);
RETURN_FALSE;
}
RETURN_TRUE;
}
PHP_FUNCTION(socket_last_error) {
// check for argument
if (arg1) {
if ((php_sock = (php_socket *)zend_fetch_resource(Z_RES_P(arg1), le_socket_name, le_socket)) == NULL) {
RETURN_FALSE;
}
// return errno from passed socket resource
RETVAL_LONG(php_sock->error);
} else {
// return errno from last globally set value
RETVAL_LONG(SOCKETS_G(last_error));
}
}
From https://github.com/php/php-src/blob/master/ext/sockets/php_sockets.h:
#define PHP_SOCKET_ERROR(socket, msg, errn) \
// store the value for this socket resource
(socket)->error = _err; \
// store the value globally
SOCKETS_G(last_error) = _err
Proof:
<?php
$s = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
socket_set_nonblock($s);
socket_connect($s, "google.com", "80");
var_dump(socket_last_error());
var_dump(socket_strerror(socket_last_error()));
Observe the difference in socket_last_error():
linux$ php asdf.php
int(115)
string(25) "Operation now in progress"
freebsd$ php asdf.php
int(36)
string(25) "Operation now in progress"
TL;DR: There's no defined list of constants inside PHP (that I can see how to use). You must supply your own OS-correct list of constants.

Array Insert Time Jump

During deep researching about hash and zval structure and how arrays are based on it, faced with strange insert time.
Here is example:
$array = array();
$someValueToInsert = 100;
for ($i = 0; $i < 10000; ++$i) {
$time = microtime(true);
array_push($array, $someValueToInsert);
echo $i . " : " . (int)((microtime(true) - $time) * 100000000) . "</br>";
}
So, I found that every 1024, 2024, 4048... element will be inserted using much more time(>~x10).
It doesn't depends will I use array_push, array_unshift, or simply $array[] = someValueToInsert.
I'm thinking about that in Hash structure:
typedef struct _hashtable {
...
uint nNumOfElements;
...
} HashTable;
nNumOfElements has default max value, but it doesn't the answer why does it took more time to insert in special counters(1024, 2048...).
Any thoughts ?
While I would suggest double checking my answer on the PHP internals list, I believe the answer lay in zend_hash_do_resize(). When more elements are needed in the hash table, this function is called and the extant hash table is doubled in size. Since the table starts life at 1024, this doubling explains the results you've observed. Code:
} else if (ht->nTableSize < HT_MAX_SIZE) { /* Let's double the table size */
void *old_data = HT_GET_DATA_ADDR(ht);
Bucket *old_buckets = ht->arData;
HANDLE_BLOCK_INTERRUPTIONS();
ht->nTableSize += ht->nTableSize;
ht->nTableMask = -ht->nTableSize;
HT_SET_DATA_ADDR(ht, pemalloc(HT_SIZE(ht), ht->u.flags & HASH_FLAG_PERSISTENT));
memcpy(ht->arData, old_buckets, sizeof(Bucket) * ht->nNumUsed);
pefree(old_data, ht->u.flags & HASH_FLAG_PERSISTENT);
zend_hash_rehash(ht);
HANDLE_UNBLOCK_INTERRUPTIONS();
I am uncertain if the remalloc is the performance hit, or if the rehashing is the hit, or the fact that the whole block is uninterruptable. Would be interesting to put a profiler on it. I think some might have already done that for PHP 7.
Side note, the Thread Safe version does things differently. I'm not overly familiar with that code, so there may be a different issue going on if your using ZTS.
I think it is related to implementation of dynamic arrays.
See here "Geometric expansion and amortized cost" http://en.wikipedia.org/wiki/Dynamic_array
To avoid incurring the cost of resizing many times, dynamic arrays resize by a large amount, **such as doubling in size**, and use the reserved space for future expansion
You can read about arrays in PHP here as well https://nikic.github.io/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html
It is a standard practice for dynamic arrays. E.g. check here C++ dynamic array, increasing capacity
capacity = capacity * 2; // doubles the capacity of the array

PHP code reaching execution time limit

I need to go through an array containing points in a map and check their distance from one another. I need to count how many nodes are within 200m and 50m of each one. It works fine for smaller amounts of values. However when I tried to run more values through it (around 4000 for scalability testing) an error occurs saying that I have reached the maximum execution time of 300 seconds. It needs to be able to handle at least this much within 300 seconds if possible.
I have read around and found out that there is a way to disable/change this limit, but I would like to know if there is a simpler way of executing the following code so that the time it takes to run it will decrease.
for($i=0;$i<=count($data)-1;$i++)
{
$amount200a=0;
$amount200p=0;
$amount50a=0;
$amount50p=0;
$distance;
for($_i=0;$_i<=count($data)-1;$_i++)
{
$distance=0;
if($data[$i][0]===$data[$_i][0])
{
}
else
{
//echo "Comparing ".$data[$i][0]." and ".$data[$_i][0]." ";
$lat_a = $data[$i][1] * PI()/180;
$lat_b = $data[$_i][1] * PI()/180;
$long_a = $data[$i][2] * PI()/180;
$long_b = $data[$_i][2] * PI()/180;
$distance =
acos(
sin($lat_a ) * sin($lat_b) +
cos($lat_a) * cos($lat_b) * cos($long_b - $long_a)
) * 6371;
$distance*=1000;
if ($distance<=50)
{
$amount50a++;
$amount200a++;
}
else if ($distance<=200)
{
$amount200a++;
}
}
}
$amount200p=100*number_format($amount200a/count($data),2,'.','');
$amount50p=100*number_format($amount50a/count($data),2,'.','');
/*
$dist[$i][0]=$data[$i][0];
$dist[$i][1]=$amount200a;
$dist[$i][2]=$amount200p;
$dist[$i][3]=$amount50a;
$dist[$i][4]=$amount50p;
//*/
$dist.=$data[$i][0]."&&".$amount200a."&&".$amount200p."&&".$amount50a."&&".$amount50p."%%";
}
Index 0 contains the unique ID of each node, 1 contains the latitude of each node and
index 2 contains the longitude of each node.
The error occurs at the second for loop inside the first loop. This loop is the one comparing the selected map node to other nodes. I am also using the Haversine Formula.
first of all, you are performing in big O notation: O(data^2), which is gonna be slow as hell , and really, either there are 2 possible solutions. Find a proven algorithm that solves the same problem in a better time. Or if you cant, start moving stuff out of the innner for loop, and mathmatically prove if you can convert the inner for loop to mostly simple calculations, which is often something you can do.
after some rewriting, I see some possiblities:
If $data is not a SPLFixedArray (which has a FAR Better access time, ) then make it. since you are accessing that data so many times (4000^2)*2.
secound, write cleaner code. although the optizmier will do its best, if you dont try either to minize the code (which only makes it more readable), then it might not be able to do it as well as possible.
and move intermediate results out of the loops, also something like the size of the array.
Currently you're checking all points against all other points, where in fact you only need to check the current point against all remaining points. The distance from A to B is the same as the distance from B to A, so why calculate it twice?
I would probably make an adjacent array that counts how many nodes are within range of each other, and increment pairs of entries in that array after I've calculated that two nodes are within range of each other.
You should probably come up with a very fast approximation of the distance that can be used to disregard as many nodes as possible before calculating the real distance (which is never going to be super fast).
Generally speaking, beyond algorithmic optimisations, the basic rules of optimisation are:
Don't any processing that you don't have to do: Like not multiplying $distance by 1000. Just change the values you're testing against from 20 and 50 to 0.02 and 0.05, respectively.
Don't call any function more often than you have to: You only need to call count($data) once before any processing starts.
Don't calculate constant values more than once: PI()/180, for example.
Move all possible processing outside of loops. I.e. precalculate as much as possible.
Another minor point which will make your code a little easier to read:
for( $i = 0; $i <= count( $data ) - 1; $i++ ) is the same as:
for( $i = 0; $i < count( $data ); $i++ )
Try this:
$max = count($data);
$CONST_PI = PI() / 180;
for($i=0;$i<$max;$i++)
{
$amount200a=0;
$amount50a=0;
$long_a = $data[$i][2] * $CONST_PI;
$lat_a = $data[$i][1] * $CONST_PI;
for($_i=0;$_i<=$max;$_i++)
//or use for($_i=($i+1);$_i<=$max;$_i++) if you did not need to calculate already calculated in other direction
{
$distance=0;
if($data[$i][0]===$data[$_i][0]) continue;
$lat_b = $data[$_i][1] * $CONST_PI;
$long_b = $data[$_i][2] * $CONST_PI;
$distance =
acos(
sin($lat_a ) * sin($lat_b) +
cos($lat_a) * cos($lat_b) * cos($long_b - $long_a)
) * 6371;
if ($distance<=0.2)
{
$amount200a++;
if ($distance<=0.05)
{
$amount50a++;
}
}
} // for %_i
$amount200p=100*number_format($amount200a/$max,2,'.','');
$amount50p=100*number_format($amount50a/$max,2,'.','');
$dist.=$data[$i][0]."&&".$amount200a."&&".$amount200p."&&".$amount50a."&&".$amount50p."%%";
} // for $i
It will be better to read I think and if you change the commented out line of the for $_i it will be faster at all :)

Merge FDF and PDF without PDFTK

Is there a way to merge FDF file and a PDF File to create a flat format of all the data and form into 1 pdf without using PDFTK?
Any light shed upon this would be greatly appreciated.
No.. There's no other way easy way to flatten, but it's awesome. Why would you need anything else?
PDFTK is actually mostly Java (literally hundreds of Java files). You could think about wrapping your own project around it. The functionality that you're looking for is here (java/com/lowagie/text/pdf/AcroFields.java:931):
/** Sets the fields by XFDF merging.
* #param xfdf the XFDF form
* #throws IOException on error
* #throws DocumentException on error
*/
public boolean setFields(XfdfReader xfdf) throws IOException, DocumentException {
boolean ret_val_b= false; // ssteward
xfdf.getFields();
for (Iterator i = fields.keySet().iterator(); i.hasNext();) {
String f = (String)i.next();
String v = xfdf.getFieldValue(f);
String rv = xfdf.getFieldRichValue(f); // ssteward
if (rv != null)
ret_val_b= true;
if (v != null)
setField(f, v, v, rv); // ssteward
}
return ret_val_b; // ssteward
}

Categories