Array Insert Time Jump - php

During deep researching about hash and zval structure and how arrays are based on it, faced with strange insert time.
Here is example:
$array = array();
$someValueToInsert = 100;
for ($i = 0; $i < 10000; ++$i) {
$time = microtime(true);
array_push($array, $someValueToInsert);
echo $i . " : " . (int)((microtime(true) - $time) * 100000000) . "</br>";
}
So, I found that every 1024, 2024, 4048... element will be inserted using much more time(>~x10).
It doesn't depends will I use array_push, array_unshift, or simply $array[] = someValueToInsert.
I'm thinking about that in Hash structure:
typedef struct _hashtable {
...
uint nNumOfElements;
...
} HashTable;
nNumOfElements has default max value, but it doesn't the answer why does it took more time to insert in special counters(1024, 2048...).
Any thoughts ?

While I would suggest double checking my answer on the PHP internals list, I believe the answer lay in zend_hash_do_resize(). When more elements are needed in the hash table, this function is called and the extant hash table is doubled in size. Since the table starts life at 1024, this doubling explains the results you've observed. Code:
} else if (ht->nTableSize < HT_MAX_SIZE) { /* Let's double the table size */
void *old_data = HT_GET_DATA_ADDR(ht);
Bucket *old_buckets = ht->arData;
HANDLE_BLOCK_INTERRUPTIONS();
ht->nTableSize += ht->nTableSize;
ht->nTableMask = -ht->nTableSize;
HT_SET_DATA_ADDR(ht, pemalloc(HT_SIZE(ht), ht->u.flags & HASH_FLAG_PERSISTENT));
memcpy(ht->arData, old_buckets, sizeof(Bucket) * ht->nNumUsed);
pefree(old_data, ht->u.flags & HASH_FLAG_PERSISTENT);
zend_hash_rehash(ht);
HANDLE_UNBLOCK_INTERRUPTIONS();
I am uncertain if the remalloc is the performance hit, or if the rehashing is the hit, or the fact that the whole block is uninterruptable. Would be interesting to put a profiler on it. I think some might have already done that for PHP 7.
Side note, the Thread Safe version does things differently. I'm not overly familiar with that code, so there may be a different issue going on if your using ZTS.

I think it is related to implementation of dynamic arrays.
See here "Geometric expansion and amortized cost" http://en.wikipedia.org/wiki/Dynamic_array
To avoid incurring the cost of resizing many times, dynamic arrays resize by a large amount, **such as doubling in size**, and use the reserved space for future expansion
You can read about arrays in PHP here as well https://nikic.github.io/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html
It is a standard practice for dynamic arrays. E.g. check here C++ dynamic array, increasing capacity
capacity = capacity * 2; // doubles the capacity of the array

Related

Arithmetic on large numbers

I've been thinking about ways to program for exteme distances, in the game Hellion for example, orbits could be near to scale in the ranges of millions of kilometers. However there was a common glitch where movement would be very choppy the further you were from the object you were orbiting. I might be wrong in my speculation to why that was, but my best guess was that it was down to loss of precision at that distance.
As a little exercise I've been thinking about ways to solve that problem and what I currently have is a a pretty basic unit staged distance system.
class Distance
{
public const MAX_AU = 63018.867924528;
public const MAX_MM = 149598000000000;
private $ly = 0;
private $au = 0;
private $mm = 0;
public function add(Distance $add): Distance
{
$distance = new Distance();
$distance->mm = $this->mm + $add->mm;
if ($distance->mm > self::MAX_MM) {
$distance->mm-= self::self::MAX_MM;
$distance->au++;
}
$distance->au+= $this->au + $add->au;
if ($distance->au > self::MAX_AU) {
$distance->au-= self::self::MAX_AU;
$distance->ly++;
}
$distance->ly+= $this->ly + $add->ly;
return $distance;
}
}
I put in the addition method, which is written as though by hand. I didn't want to use arbitary precision because it would have been too extreme for calculating the smaller distances a player would normally interact with.
My question is; is this how something like this is normally done? and if not, could someone please explain what is wrong with this (inefficient for example), and how it could be done better?
Thanks
PS. I am aware that in the context of a game, this is normally handled with sub-grids, but to simulate how objects in orbit would drift apart, thats what this is for.
You can use the BCMath functions. BCMath supports numbers of any size and precision.
Example:
$mm = "149598000000000";
$au = "63018.867924528";
$sum = bcadd($mm,$au,10);
echo $sum;
//149598000063018.8679245280

PHP Function to generate an infinite unique indentifier

I'm making a new project in Zend 3 that requires me to have a unique ID or HASH which I can use in several places later.
I looked at many examples on Google, and could not find a function that can satisfy my requirements because this needs to be 99% unique all the time, and it needs to be able to generate hundreds, millions of "hashes" unique all the time.
The following function caught my attention:
function uniqidReal($lenght = 13) {
// uniqid gives 13 chars, but you could adjust it to your needs.
if (function_exists("random_bytes")) {
$bytes = random_bytes(ceil($lenght / 2));
} elseif (function_exists("openssl_random_pseudo_bytes")) {
$bytes = openssl_random_pseudo_bytes(ceil($lenght / 2));
} else {
throw new Exception("no cryptographically secure random function available");
}
return substr(bin2hex($bytes), 0, $lenght);
}
A simple test:
echo "<pre>";
for($i = 0; $i < 100; $i++)
{
echo $this->uniqidReal(25) .PHP_EOL ;
}
The result:
a8ba1942ad99d09f496d3d564
5b24746d09cada4b2dc9816bd
c6630c35bc9b4ed0907c803e0
48e04958b633e8a5ead137bb1
643a4ce1bcbca66cea397e85e
d2cd4c6f8dc7054dd0636075f
d9c78bae38720b7e0cc6361f2
54e5f852862adad2ad7bc3349
16c4e42e4f63f62bf9653f96e
c63d64af261e601e4b124e38f
29a3efa07a4d77406349e3020
107d78fdfca13571c152441f2
591b25ebdb695c8259ccc7fe9
105c4f2cc5266bb82222480ba
84e9ad8fd76226f86c89c1ac1
39381d31f494d320abc538a8e
7f8141db50a41b15a85599548
7b15055f6d9fb1228b7438d2a
659182c7bcd5b050befd3fc4c
06f70d134a3839677caa0d246
600b15c9dc53ef7a4551b8a90
a9c8af631c5361e8e1e1b8d9d
4b4b0aca3bbf15d35dd7d1050
f77024a07ee0dcee358dc1f5e
408c007b9d771718263b536e1
2de08e01684805a189224db75
c3838c034ae22d21f27e5d040
b15e9b0bab6ef6a56225a5983
251809396beb9d24b384f5fe8
cec6d262803311152db31b723
95d271ffdfe9df5861eefbaa4
7c11f3401530790b9ef510e55
e363390e2829097e7762bddc4
7ef34c69d9b8e38d72c6db29f
309a84490a7e387aaff1817ca
c214af2927c683954894365df
9f70859880b7ffa4b28265dbb
608e2f2f9e38025d92a1a4f03
c457a54d2da30a4a517edf14c
8670acbded737b1d2febdd954
99899b74b6469e366122b658c
3066408f5b4e86ef84bdb3fb9
010715f4955f66da3402bfa7b
fa01675690435b914631b46e1
2c5e234c5868799f31a6c983c
8345da31809ab2d9714a01d05
7b4e0e507dd0a8b6d7170a265
5aa71aded9fe7afa9a93a98c5
3714fb9f061398d4bb6af909d
165dd0af233cce64cefec12ed
849dda54070b868b50f356068
fe5f6e408eda6e9d429fa34ed
cd13f8da95c5b92b16d9d2781
65d0f69b41ea996ae2f8783a5
5742caf7a922eb3aaa270df30
f381ac4b84f3315e9163f169e
8c2afa1ab32b6fe402bf97ba3
a9f431efe6fc98aa64dbecbc2
8f0746e4e9529326d087f828b
bfc3cbea4d7f5c4495a14fc49
e4bf2d1468c6482570612360e
f1c7238766acdb7f199049487
60ae8a1ffd6784f7bbbc7b437
30afd67f207de6e893f7c9f42
dfa151daccb0e8d64d100f719
07be6a7d4aab21ccd9942401b
73ca1a54fcc40f7a46f46afbd
94ed2888fb93cb65d819d9d52
b7317773c6a15aa0bdf25fa01
edbb7f20f7523d9d941f3ebce
99a3c204b9f2036d3c38342bb
a0585424b8ab2ffcabee299d5
64e669fe2490522451cf10f85
18b8be34d4c560cda5280a103
9524d1f024b3c9864a3fccf75
0e7e94e7974894c98442241bc
4a17cc5e3d2baabaa338f592e
b070eaf38f390516f5cf61aa7
cc7832ea327b7426d8d2b8c2b
0df0a1d4833ebbb5d463c56bf
1bb610a8bb4e241996c9c756a
34ac2fdeb4b88fe6321a1d9c3
f0b20f8e79090dcb65195524c
307252efdd2b833228e0c301f
3908e63b405501782e629ac0b
29e66717adf14fb30c626103d
c8abd48af5f9332b322dffad0
80cd4e162bc7e8fb3a756b48c
825c00cec2294061eb328dd97
106205a2e24609652d149bc17
f1f896657fbc6f6287e7dee20
0fbd16ade658e24d69f76a225
4ab3b5eeeda86fa81afba796a
11d34f3d2ffb61d55da560ddb
013d6151bad187906fcc579a4
4509279a28f34bcf5327dd4c0
3c0eb47b3f9dc5a2f794bb9ad
1e6506906f23542c889330836
e7b1c5012390f3c7c48def9f3
d86caa695cb5fa1e0a2ead4cc
But I cannot confirm that this does guarantee me a 99% success rate for my production environment.
If someone can advise me, or provide me an example I would much appreciate it!
Function random_bytes generates cryptographically secure random bytes
For openssl_random_pseudo_bytes add the crypto_strong paramdeter to ensure the algorithm used is cryptographically strong.
Since your requirement is only 99% unique cryptographically secure random bytes will meet your requirement.
This should be a comment, but its a bit long.
There is some confusion over your use of "unique" and "all the time". A token is either unique or it is not. Using a random number generator to create tokens alone is not sufficient to guarantee uniqueness - the whole point of a random number generator is that you don't know what the next value to be generated will be - meaning you also don't know that the next number won't be the same as a previous number. OTOH, using random_bytes() or openssl_random_pseudo_bytes() to generate a token which is "99% unique all the time" seems like a massive overkill.
To work out how unique this is likely to be we would need to know how many tokens will be considered within the populations at any one time (or to be able to calculate this from the expected rate of creation and the TTL).
That you are using large numbers rather implies you have a very good reason for not using the simplest and most obvious unique identifier - i.e. an incrementing integer. Hence the resistance to guessing an existing identifier is clearly critical to the implementation - but again you've told us nothing about that.
Pasting the title of your post into Google turns up your post as the top result - with PHP's uniqid() function immediately after it - yet for some reason you've either not found uniqid() or have rejected it for some reason.
The title of your post is also an oxymoron - In order to define an infinite set of identifiers, the identifiers would need to be of infinite length.
it needs to be able to generate hundreds, millions of "hashes"
....and you want it all to run within the Zend Framework? - LOL.
But I cannot confirm that this does guarantee me a 99% success rate for my production environment.
Why not? You have sufficient information here to confirm that the bitwise entropy is evenly distributed and should know the planned capacity of the production environment. The rest is basic arithmetic.
We are about 8x10⁹ people. Imagine all us access your site once each second needing a unique identifier during a year. You need about 2,52288×10²³ identifiers. If you think your site will be in production about 1000 years, and population get bigger by a 1000 factor you need about 10²⁹ identifiers; so a 32 bytes auto-incremental string is good enough. Add as suffix a pseudo-random 32 bytes string to get a secure 64 bytes identifier. Doing a bit plus you can hash identifiers to create tokens.
Then is easy to write a function to get them.
Edited 2017/04/13
A small sample:
The first thing you need is a pseudo-random strong keys generator. I'll post the function I'm using currently:
<?php
function pseudoRandomBytes($count = 32){
static $random_state, $bytes, $has_openssl, $has_hash;
$missing_bytes = $count - strlen($bytes);
if ($missing_bytes > 0) {
// If you are using a Php version before 5.3.4 avoid using
// openssl_random_pseudo_bytes()
if (!isset($has_openssl)) {
$has_openssl = version_compare(PHP_VERSION, '5.3.4', '>=')
&& function_exists('openssl_random_pseudo_bytes');
}
// to get entropy
if ($has_openssl) {
$bytes .= openssl_random_pseudo_bytes($missing_bytes);
} elseif ($fh = #fopen('/dev/urandom', 'rb')) {
// avoiding openssl_random_pseudo_bytes()
// you find entropy at /dev/urandom usually available in most
// *nix systems
$bytes .= fread($fh, max(4096, $missing_bytes));
fclose($fh);
}
// If it fails you must create enough entropy
if (strlen($bytes) < $count) {
// Initialize on the first call. The contents of $_SERVER
// includes a mix of user-specific and system information
// that varies a little with each page.
if (!isset($random_state)) {
$random_state = print_r($_SERVER, TRUE);
if (function_exists('getmypid')) {
// Further initialize with the somewhat random PHP process ID.
$random_state .= getmypid();
}
// hash() is only available in PHP 5.1.2+ or via PECL.
$has_hash = function_exists('hash')
&& in_array('sha256', hash_algos());
$bytes = '';
}
if ($has_hash) {
do {
$random_state = hash('sha256', microtime() . mt_rand() .
$random_state);
$bytes .= hash('sha256', mt_rand() . $random_state, TRUE);
} while (strlen($bytes) < $count);
} else {
do {
$random_state = md5(microtime() . mt_rand() . $random_state);
$bytes .= pack("H*", md5(mt_rand() . $random_state));
} while (strlen($bytes) < $count);
}
}
}
$output = substr($bytes, 0, $count);
$bytes = substr($bytes, $count);
return $output;
}
Once you have that function you need a function to create your random keys:
<?php
function pseudo_random_key($byte_count = 32) {
return base64_encode(pseudoRandomBytes($byte_count));
}
As random does not mean unique! you need to merge a unique 32 bytes prefix as I suggested. As big number functions are time-expensive I'll use a chunk-math function using a prefix I suppose generated from time to time using a cron function and stored at an environment DB variable and an auto-incremental index also db-stored
<?php
function uniqueChunkMathKeysPrefix(){
// a call to read your db for prefix
// I suppose you have an environment string-keyed table
// and a couple of dbfunction to read and write data to it
$last18bytesPrefix = dbReadEnvVariable('unique_prefix');
// Also you store your current index wich returns to 0 once you get
// a 99999999999999 value
$lastuniqueindex = dbReadEnvVariable('last_unique_keys_index');
if ($lastuniqueindex < 99999999999999){
$currentuniqueindex = $lastuniqueindex + 1;
$curret18bytesPrefix = $last18bytesPrefix;
}else{
$currentuniqueindex = 0;
$curret18bytesPrefix = dbReadEnvVariable('next_unique_prefix');
// flag your db variables to notify cron to create a new next prefix
dbStoreEnvVariable('next_unique_prefix', 0);
dbStoreEnvVariable('unique_prefix', $curret18bytesPrefix);
// you have the time needed to have site visits and create new
// 99999999999999 keys as a while to run your cron to adjust your
// next prefix
}
// store your current index
dbStoreEnvVariable('last_unique_keys_index', $currentuniqueindex);
// Finally you create the unique index prefix part
$uniqueindexchunk = substr('00000000000000'.$currentuniqueindex, -14);
// return the output
return $curret18bytesPrefix.$uniqueindexchunk;
}
Now you can write a function for unique pseudo-random 64 bytes uniquekeys
<?php
function createUniquePseudoRandomKey(){
$newkey = uniqueChunkMathKeysPrefix() . pseudo_random_key(32);
// to beautify the output make a dummie call
// masking the 0s ties
return md5($newkey);
}

php - Remove duplicates from current array while looping through the same array?

i'm playing with cURL to crawl pages and extract links. This is some code that targets my issue.
for($i = 0; $i < sizeof($links); $i++){
$response = crawl($links[$i]);
//inside this loop i extract links for each crawled html
$newLinks = getLinks($response);
//I need to append these new links to current array in loop
$links= array_values(array_unique(array_merge($links, $newLinks));
}
I need to prevent for duplicate links so i don't crawl twice. I wonder if this is a safe approach or if it's right at all, since array_values whould reindex the elements of the array and while in loop the crawling could run twice for some link.
I could test with in_array() against $links and $newLinks to avoid duplicates but i wonder what happens when doing like my sample here.
This example work if you get only one link by function getLinks();
$newLink = getLinks($response);
if(!in_array($newLink,$links)){
$links=array_merge($links, array($newLink));
}
If you get more link, you can use foreach for links
foreach($newLinks as $newLink){
if(!in_array($newLink,$links)){
$links=array_merge($links, array($newLink));
}
}
TL/DR: Use array_merge($links, array_diff($newLinks, $links));
My justification:
Welcome to the land of exponential growth. As your collected links array grows, the time it takes will go through the roof. Give array_merge(array_diff($links, $new_links)) a chance instead. Using this benchmarking code:
function get($num)
{
$links = array();
for($i=0;$i<rand(5,20);$i++)
{
$links[] = rand(1,$num);
}
return $links;
}
function test($iter, $num_total_links)
{
$unique_time = 0;
$unique_links = array();
$diff_time = 0;
$diff_links = array();
for($i=0;$i<$iter;$i++)
{
$new_links = get($num_total_links);
$start = microtime(true);
$unique_links =
array_values(array_unique(array_merge($unique_links, $new_links)));
$end = microtime(true);
$unique_time += $end - $start;
$start = microtime(true);
$diff_links = array_values(array_merge(array_diff($new_links, $diff_links)));
$end = microtime(true);
$diff_time += $end - $start;
}
echo $unique_time . ' - ' . $diff_time;
}
You can tweak the values; if you expect to surf a large number of pages with relatively few links in common, pick a large (not too large or it'll take forever) $iter and $num_total_links. If you're likely to see many of the same link, reduce the $num_total_links accordingly.
What it boils down to is that a merge and then unique operation always requires you to merge in the same number of links, while a diff and then merge only requires you to merge in the links you want to add. This is nearly always more efficient, even at numbers that aren't exactly huge; surfing 500 pages with between 5 and 20 links makes a huge difference in time, and even a small number of pages can show a marked difference.
Looking at the data, foreach isn't a bad way to go as compared with array_unique; doing a similar benchmark, foreach consistently beat array_unique. But both are O(n^2), with foreach just growing more slowly. array_diff preserves O(n) time, which is your best case; your algorithm is never going to get any faster than some multiple of the number of pages you visit. PHP's built-in array functions are going to be faster than pretty much any solution written in PHP.
Here is the data, with the random factors taken out because all they did was cause deviations, not affect the results. I've also ramped up the possible "universe" of URLs to a large number for illustration purposes; at a smaller number, both array_unique and foreach produce graphs which look almost linear, albeit that they continue to be outperformed by array_diff. If you plug this into Google Charts, you can see just what I'm talking about.
['Number of Pages', 'array_unique', 'array_diff', 'foreach'],
[1, 6.9856643676758E-5, 6.1988830566406E-5, 0.00012898445129395],
[11, 0.0028481483459473, 0.00087666511535645, 0.0014169216156006],
[21, 0.0091345310211182, 0.0017409324645996, 0.0029785633087158],
[31, 0.019546031951904, 0.0023491382598877, 0.0046005249023438],
[41, 0.036402702331543, 0.0032360553741455, 0.006026029586792],
[51, 0.055278301239014, 0.0039372444152832, 0.0078754425048828],
[61, 0.082642316818237, 0.0048537254333496, 0.010209321975708],
[71, 0.11405396461487, 0.0054631233215332, 0.012364625930786],
[81, 0.15123820304871, 0.0062053203582764, 0.014509916305542],
[91, 0.19236493110657, 0.007127046585083, 0.017033576965332],
[101, 0.24052715301514, 0.0080602169036865, 0.01974892616272],
[111, 0.29827189445496, 0.0085773468017578, 0.023083209991455],
[121, 0.35718178749084, 0.0094895362854004, 0.025837421417236],
[131, 0.42515468597412, 0.010404586791992, 0.029412984848022],
[141, 0.49908661842346, 0.011186361312866, 0.033211469650269],
[151, 0.56992983818054, 0.011844635009766, 0.036608695983887],
[161, 0.65314698219299, 0.012562274932861, 0.039996147155762],
[171, 0.74602556228638, 0.013403177261353, 0.04484486579895],
[181, 0.84450364112854, 0.014075994491577, 0.04839038848877],
[191, 0.94431185722351, 0.01488733291626, 0.052026748657227],
[201, 1.0460951328278, 0.015958786010742, 0.056291818618774],
[211, 1.2530679702759, 0.016806602478027, 0.060890197753906],
[221, 1.2901678085327, 0.017560005187988, 0.065101146697998],
[231, 1.4267380237579, 0.018605709075928, 0.070043087005615],
[241, 1.5581474304199, 0.018914222717285, 0.075717210769653],
[251, 1.8255474567413, 0.020106792449951, 0.08226203918457],
[261, 1.8533885478973, 0.020873308181763, 0.085562705993652],
[271, 1.999392747879, 0.021762609481812, 0.15557670593262],
[281, 2.1670596599579, 0.022242784500122, 0.098419427871704],
[291, 2.4296963214874, 0.023237705230713, 0.10490798950195],
[301, 3.0475504398346, 0.031109094619751, 0.13519287109375],
[311, 3.0027780532837, 0.02937388420105, 0.13496232032776],
[321, 2.9123396873474, 0.025942325592041, 0.12607669830322],
[331, 3.0720682144165, 0.026587963104248, 0.13313007354736],
[341, 3.3559355735779, 0.028125047683716, 0.14407730102539],
[351, 3.5787575244904, 0.031508207321167, 0.15093517303467],
[361, 3.6996841430664, 0.028955698013306, 0.15273785591125],
[371, 3.9983749389648, 0.02990198135376, 0.16448092460632],
[381, 4.1213915348053, 0.030521154403687, 0.16835069656372],
[391, 4.3574469089508, 0.031461238861084, 0.17818260192871],
[401, 4.7959914207458, 0.032914161682129, 0.19097280502319],
[411, 4.9738960266113, 0.033754825592041, 0.19744348526001],
[421, 5.3298072814941, 0.035082101821899, 0.2117555141449],
[431, 5.5753719806671, 0.035769462585449, 0.21576929092407],
[441, 5.7648482322693, 0.035907506942749, 0.2213134765625],
[451, 5.9595069885254, 0.036591529846191, 0.2318480014801],
[461, 6.4193341732025, 0.037969827651978, 0.24672293663025],
[471, 6.7780020236969, 0.039541244506836, 0.25563311576843],
[481, 7.0454154014587, 0.039729595184326, 0.26160192489624],
[491, 7.450076341629, 0.040610551834106, 0.27283143997192]

PHP code reaching execution time limit

I need to go through an array containing points in a map and check their distance from one another. I need to count how many nodes are within 200m and 50m of each one. It works fine for smaller amounts of values. However when I tried to run more values through it (around 4000 for scalability testing) an error occurs saying that I have reached the maximum execution time of 300 seconds. It needs to be able to handle at least this much within 300 seconds if possible.
I have read around and found out that there is a way to disable/change this limit, but I would like to know if there is a simpler way of executing the following code so that the time it takes to run it will decrease.
for($i=0;$i<=count($data)-1;$i++)
{
$amount200a=0;
$amount200p=0;
$amount50a=0;
$amount50p=0;
$distance;
for($_i=0;$_i<=count($data)-1;$_i++)
{
$distance=0;
if($data[$i][0]===$data[$_i][0])
{
}
else
{
//echo "Comparing ".$data[$i][0]." and ".$data[$_i][0]." ";
$lat_a = $data[$i][1] * PI()/180;
$lat_b = $data[$_i][1] * PI()/180;
$long_a = $data[$i][2] * PI()/180;
$long_b = $data[$_i][2] * PI()/180;
$distance =
acos(
sin($lat_a ) * sin($lat_b) +
cos($lat_a) * cos($lat_b) * cos($long_b - $long_a)
) * 6371;
$distance*=1000;
if ($distance<=50)
{
$amount50a++;
$amount200a++;
}
else if ($distance<=200)
{
$amount200a++;
}
}
}
$amount200p=100*number_format($amount200a/count($data),2,'.','');
$amount50p=100*number_format($amount50a/count($data),2,'.','');
/*
$dist[$i][0]=$data[$i][0];
$dist[$i][1]=$amount200a;
$dist[$i][2]=$amount200p;
$dist[$i][3]=$amount50a;
$dist[$i][4]=$amount50p;
//*/
$dist.=$data[$i][0]."&&".$amount200a."&&".$amount200p."&&".$amount50a."&&".$amount50p."%%";
}
Index 0 contains the unique ID of each node, 1 contains the latitude of each node and
index 2 contains the longitude of each node.
The error occurs at the second for loop inside the first loop. This loop is the one comparing the selected map node to other nodes. I am also using the Haversine Formula.
first of all, you are performing in big O notation: O(data^2), which is gonna be slow as hell , and really, either there are 2 possible solutions. Find a proven algorithm that solves the same problem in a better time. Or if you cant, start moving stuff out of the innner for loop, and mathmatically prove if you can convert the inner for loop to mostly simple calculations, which is often something you can do.
after some rewriting, I see some possiblities:
If $data is not a SPLFixedArray (which has a FAR Better access time, ) then make it. since you are accessing that data so many times (4000^2)*2.
secound, write cleaner code. although the optizmier will do its best, if you dont try either to minize the code (which only makes it more readable), then it might not be able to do it as well as possible.
and move intermediate results out of the loops, also something like the size of the array.
Currently you're checking all points against all other points, where in fact you only need to check the current point against all remaining points. The distance from A to B is the same as the distance from B to A, so why calculate it twice?
I would probably make an adjacent array that counts how many nodes are within range of each other, and increment pairs of entries in that array after I've calculated that two nodes are within range of each other.
You should probably come up with a very fast approximation of the distance that can be used to disregard as many nodes as possible before calculating the real distance (which is never going to be super fast).
Generally speaking, beyond algorithmic optimisations, the basic rules of optimisation are:
Don't any processing that you don't have to do: Like not multiplying $distance by 1000. Just change the values you're testing against from 20 and 50 to 0.02 and 0.05, respectively.
Don't call any function more often than you have to: You only need to call count($data) once before any processing starts.
Don't calculate constant values more than once: PI()/180, for example.
Move all possible processing outside of loops. I.e. precalculate as much as possible.
Another minor point which will make your code a little easier to read:
for( $i = 0; $i <= count( $data ) - 1; $i++ ) is the same as:
for( $i = 0; $i < count( $data ); $i++ )
Try this:
$max = count($data);
$CONST_PI = PI() / 180;
for($i=0;$i<$max;$i++)
{
$amount200a=0;
$amount50a=0;
$long_a = $data[$i][2] * $CONST_PI;
$lat_a = $data[$i][1] * $CONST_PI;
for($_i=0;$_i<=$max;$_i++)
//or use for($_i=($i+1);$_i<=$max;$_i++) if you did not need to calculate already calculated in other direction
{
$distance=0;
if($data[$i][0]===$data[$_i][0]) continue;
$lat_b = $data[$_i][1] * $CONST_PI;
$long_b = $data[$_i][2] * $CONST_PI;
$distance =
acos(
sin($lat_a ) * sin($lat_b) +
cos($lat_a) * cos($lat_b) * cos($long_b - $long_a)
) * 6371;
if ($distance<=0.2)
{
$amount200a++;
if ($distance<=0.05)
{
$amount50a++;
}
}
} // for %_i
$amount200p=100*number_format($amount200a/$max,2,'.','');
$amount50p=100*number_format($amount50a/$max,2,'.','');
$dist.=$data[$i][0]."&&".$amount200a."&&".$amount200p."&&".$amount50a."&&".$amount50p."%%";
} // for $i
It will be better to read I think and if you change the commented out line of the for $_i it will be faster at all :)

How to find the centre of a grid of lines on google maps

I'm struggling with a problem with some GIS information that I am trying to put into a KML file to be used by google maps and google Earth.
I have SQL database containing a number of surveys, which are stored in lines. When all the lines are drawn on the map, they create a grid. What I am trying to do is work out the centre of these 'grids' so that I can put a placemarker reference into the kml to show all the grid locations on a map.
lines is stored in the database like this:
118.718318,-19.015803,0 118.722449,-19.016919,0 118.736223,-19.020637,0 118.749936,-19.024023,0 118.763897,-19.027722,0 118.777705,-19.031277,0 118.791416,-19.034826,0 118.805276,-19.038367,0 118.818862,-19.041962,0 118.832862,-19.045582,0 118.846133,-19.049563,0 118.859801,-19.053851,0 118.873322,-19.058145,0 118.887022,-19.062349,0 118.900595,-19.066594,0 118.914066,-19.070839,0 118.927885,-19.075151,0 118.941468,-19.079354,0 118.955064,-19.083658,0 118.968766,-19.087896,0 118.982247,-19.092054,0 118.995795,-19.096324,0 119.009192,-19.100448,0 119.022805,-19.104787,0 119.036414,-19.10893,0 119.049625,-19.113166,0 119.063155,-19.11738,0 119.076626,-19.121563,0 119.090079,-19.125738,0 119.103679,-19.129968,0 119.117009,-19.134168,0 119.130637,-19.138418,0 119.144134,-19.142613,0 119.157749,-19.146767,0 119.171105,-19.151058,0 119.184722,-19.155252,0 119.19844,-19.159399,0 119.211992,-19.163737,0 119.225362,-19.167925,0 119.239109,-19.17218,0 119.252552,-19.176239,0 119.265975,-19.180483,0 119.279718,-19.184901,0 119.2931,-19.189118,0 119.306798,-19.193267,0 119.320561,-19.197631,0 119.33389,-19.201898,0 119.34772,-19.206063,0 119.361322,-19.210282,0 119.374522,-19.214505,0 119.38825,-19.218753,0 119.401948,-19.22299,0 119.415609,-19.227171,0 119.42913,-19.231407,0 119.44269,-19.235604,0 119.456221,-19.240254,0 119.469772,-19.244165,0 119.483159,-19.248341,0 119.496911,-19.252483,0 119.510569,-19.256818,0 119.524034,-19.261068,0 119.537822,-19.265332,0 119.551409,-19.269529,0 119.564744,-19.273764,0 119.578357,-19.277874,0 119.591935,-19.282171,0 119.605472,-19.286424,0 119.619009,-19.290593,0 119.632777,-19.294862,0 119.646394,-19.299146,0 119.660107,-19.303469,0 119.673608,-19.307602,0 119.6872,-19.31182,0 119.700838,-19.31605,0 119.714555,-19.320301,0 119.728202,-19.324615,0 119.741511,-19.328794,0 119.755299,-19.333098,0 119.768776,-19.337206,0 119.782216,-19.341528,0 119.796141,-19.345829,0 119.809691,-19.349879,0 119.822889,-19.354112,0 119.836822,-19.358414,0 119.850077,-19.362618,0 119.864069,-19.366916,0 119.8773,-19.371092,0 119.891263,-19.37536,0 119.904612,-19.379556,0 119.918522,-19.38394,0 119.932101,-19.388108,0 119.945577,-19.392184,0 119.959304,-19.396544,0 119.973042,-19.400809,0 119.986433,-19.405015,0 119.999196,-19.408981,0 120.011959,-19.412946,0 120.014512,-19.41374,0
118.990393,-19.523933,0 118.990833,-19.522296,0 118.994368,-19.509069,0 118.997911,-19.49589,0 119.001249,-19.48258,0 119.004939,-19.469658,0 119.008437,-19.456658,0 119.01183,-19.443361,0 119.015332,-19.430439,0 119.01869,-19.417491,0 119.022303,-19.40421,0 119.025853,-19.391381,0 119.029387,-19.377966,0 119.032821,-19.365006,0 119.036247,-19.352047,0 119.039935,-19.338701,0 119.043376,-19.325781,0 119.046824,-19.312713,0 119.050223,-19.299634,0 119.053822,-19.286711,0 119.05736,-19.273441,0 119.060722,-19.260467,0 119.064357,-19.247382,0 119.067863,-19.234069,0 119.071361,-19.221168,0 119.07486,-19.208268,0 119.078303,-19.19503,0 119.081721,-19.181915,0 119.085296,-19.169016,0 119.088851,-19.155937,0 119.092245,-19.142828,0 119.095887,-19.12944,0 119.099337,-19.116291,0 119.102821,-19.103421,0 119.106333,-19.090384,0 119.109885,-19.077356,0 119.113395,-19.063976,0 119.116866,-19.050973,0 119.120467,-19.037886,0 119.123751,-19.024888,0 119.127385,-19.011622,0 119.130775,-18.998856,0 119.134503,-18.98547,0 119.13873,-18.9728,0 119.14289,-18.959666,0 119.146076,-18.946541,0 119.149065,-18.933693,0 119.152281,-18.920522,0 119.155414,-18.907033,0 119.158442,-18.894058,0 119.161421,-18.88097,0 119.164632,-18.867484,0 119.167668,-18.854501,0 119.170744,-18.841401,0 119.173939,-18.827999,0 119.17703,-18.815028,0 119.17984,-18.802218,0 119.182902,-18.789334,0 119.186088,-18.776136,0 119.189031,-18.762862,0 119.192179,-18.749824,0 119.19539,-18.7365,0 119.198255,-18.723279,0 119.201476,-18.710134,0 119.204345,-18.69705,0 119.20746,-18.683768,0 119.210555,-18.670607,0 119.213641,-18.657632,0 119.216727,-18.644301,0 119.21982,-18.630956,0 119.223094,-18.617243,0 119.22625,-18.603885,0 119.229368,-18.590534,0 119.232494,-18.577364,0 119.235477,-18.564287,0 119.238496,-18.550953,0 119.241392,-18.53789,0 119.244609,-18.524881,0 119.247551,-18.512017,0 119.250532,-18.498916,0 119.253729,-18.485859,0 119.256428,-18.473845,0 119.259288,-18.461645,0 119.262064,-18.4491,0 119.265024,-18.437048,0 119.267877,-18.424992,0 119.270457,-18.4136,0 119.273429,-18.401305,0 119.276346,-18.388484,0 119.279506,-18.375299,0 119.282307,-18.362197,0 119.285494,-18.34918,0 119.288695,-18.336115,0 119.291008,-18.323772,0 119.294026,-18.310635,0 119.297332,-18.297714,0 119.300499,-18.284553,0 119.303442,-18.271193,0 119.307081,-18.258278,0 119.309945,-18.245139,0 119.313121,-18.232137,0 119.31642,-18.218993,0 119.319499,-18.205722,0 119.322801,-18.192774,0 119.325986,-18.179558,0 119.329173,-18.166332,0 119.33236,-18.15312,0 119.335558,-18.140071,0 119.338761,-18.126696,0 119.342007,-18.113502,0 119.345238,-18.100349,0 119.34808,-18.088343,0 119.350259,-18.079138,0
119.912412,-18.177179,0 119.910223,-18.183223,0 119.905619,-18.195973,0 119.901171,-18.208645,0 119.896641,-18.221452,0 119.89198,-18.234323,0 119.887547,-18.246912,0 119.882922,-18.259925,0 119.878421,-18.272522,0 119.873909,-18.285223,0 119.869264,-18.298194,0 119.864746,-18.310939,0 119.860203,-18.323628,0 119.855692,-18.336434,0 119.850997,-18.349262,0 119.846561,-18.362014,0 119.841916,-18.374866,0 119.837406,-18.387565,0 119.832778,-18.400657,0 119.828259,-18.413072,0 119.82364,-18.426273,0 119.819088,-18.438992,0 119.814546,-18.451696,0 119.810103,-18.464425,0 119.80548,-18.477041,0 119.800935,-18.48989,0 119.796359,-18.502741,0 119.791812,-18.515489,0 119.787314,-18.528536,0 119.782724,-18.540965,0 119.778079,-18.553994,0 119.773558,-18.56663,0 119.768931,-18.57955,0 119.76446,-18.592177,0 119.759793,-18.605255,0 119.755393,-18.617736,0 119.750648,-18.630757,0 119.746479,-18.643371,0 119.741779,-18.656176,0 119.737071,-18.669135,0 119.732572,-18.681798,0 119.727954,-18.694669,0 119.723453,-18.707473,0 119.718855,-18.720259,0 119.714169,-18.733198,0 119.70946,-18.746355,0 119.704998,-18.759005,0 119.700581,-18.771793,0 119.696226,-18.784893,0 119.691577,-18.797373,0 119.68662,-18.810064,0 119.682182,-18.822772,0 119.677711,-18.835621,0 119.672955,-18.848475,0 119.668536,-18.861252,0 119.663856,-18.873901,0 119.659412,-18.88663,0 119.654815,-18.899592,0 119.650185,-18.912338,0 119.645586,-18.925118,0 119.640925,-18.937923,0 119.636617,-18.950753,0 119.631874,-18.963446,0 119.627376,-18.976275,0 119.622845,-18.98893,0 119.618175,-19.001942,0 119.613727,-19.014668,0 119.608878,-19.027398,0 119.60445,-19.039954,0 119.599954,-19.053249,0 119.595066,-19.066008,0 119.590562,-19.078866,0 119.586062,-19.091703,0 119.58155,-19.104199,0 119.576948,-19.116924,0 119.572431,-19.129926,0 119.567449,-19.142699,0 119.563227,-19.155474,0 119.558555,-19.168309,0 119.553956,-19.181053,0 119.549545,-19.193649,0 119.544854,-19.20646,0 119.540133,-19.21929,0 119.535606,-19.232162,0 119.53108,-19.245054,0 119.526509,-19.257772,0 119.522079,-19.270427,0 119.517419,-19.283013,0 119.512755,-19.296126,0 119.508159,-19.309031,0 119.503557,-19.321821,0 119.498966,-19.334394,0 119.494487,-19.347232,0 119.489815,-19.359907,0 119.485201,-19.372962,0 119.48067,-19.385811,0 119.476151,-19.398326,0 119.471526,-19.411138,0 119.466856,-19.424266,0 119.462284,-19.436788,0 119.457752,-19.449628,0 119.452975,-19.462718,0 119.448612,-19.47499,0 119.444249,-19.48726,0 119.439886,-19.499531,0 119.437704,-19.505666,0
Each point is where a mesasurement took place.
Is there a PHP library or a formula that can be used to work this out without being too intensive?
Thanks
It depends very much on how you define "center". One somewhat sophisticated approach you might like to try:
Extract a list of points from the lines
Find the convex hull of the points:
Find the centroid of the convex hull
(source: algorithmic-solutions.info)
The convex hull is simply described as the simplest polygon that passes through some of the points, forming an envelop that encloses all of the points within the polygon. wikipedia has a more rigorous description.
The centroid provides a simple way for finding the "center of gravity" of a polygon. A wikipedia and a nice tutorial provide more information.
Alternatively, a very simple implementation might find the minimum bounding rectangle of the lines, and then find the geometric center of the box. The following code assumes the lines you posted above are represented as strings, and are in an array. I'm no php performance expert, but this algorithm should execute in O(n) asymptotic time:
<?php
$line_strs = array(
"118.718318,-19.015803,0 118.722449,-19.016919,0 118.736223,-19.020637,0 118.749936,-19.024023,0 118.763897,-19.027722,0 118.777705,-19.031277,0", "118.791416,-19.034826,0 118.805276,-19.038367,0 118.818862,-19.041962,0 118.832862,-19.045582,0 118.846133,-19.049563,0 118.859801,-19.053851,0",
[SNIP majority of lines...]
"119.448612,-19.47499,0 119.444249,-19.48726,0 119.439886,-19.499531,0 119.437704,-19.505666,0");
$xs = array();
$ys = array();
foreach ($line_strs as $line){
$points = explode(' ', $line);
foreach ($points as $pt){
$xyz = explode(',', $pt);
$xs[] = (float)$xyz[0];
$ys[] = (float)$xyz[1];
}
}
$x_mean = (max($xs) + min($xs)) / 2;
$y_mean = (max($ys) + min($ys)) / 2;
echo "$x_mean,$y_mean\n";
?>
This outputs 119.366415,-18.8015355.
centroids ?
If I have time, I'll edit this answer to include a python implementation
Another very simple answer finds the mean location of all vertices. It favours areas that are densely populated by points. This returns a slightly different answer to the bounding box method. You might want to try them box to see which provides a "better" result for your data.
<?php
$line_strs = array(
"118.718318,-19.015803,0 118.722449,-19.016919,0 118.736223,-19.020637,0 118.749936,-19.024023,0 118.763897,-19.027722,0 118.777705,-19.031277,0", "118.791416,-19.034826,0 118.805276,-19.038367,0 118.818862,-19.041962,0 118.832862,-19.045582,0 118.846133,-19.049563,0 118.859801,-19.053851,0",
[SNIP majority of lines...]
"119.448612,-19.47499,0 119.444249,-19.48726,0 119.439886,-19.499531,0 119.437704,-19.505666,0");
$x_sum = 0.0;
$y_sum = 0.0;
$n = 0;
foreach ($line_strs as $line){
$points = explode(' ', $line);
foreach ($points as $pt){
$xyz = explode(',', $pt);
$x_sum += (float)$xyz[0];
$y_sum += (float)$xyz[1];
$n++;
}
}
$x_mean = $x_sum / $n;
$y_mean = $y_sum / $n;
echo "$x_mean,$y_mean\n";
?>
This outputs 119.402034114,-18.9427670536.

Categories