I'm still relatively new to PHP and trying to use pthreads to solve an issue. I have 20 threads running processes that end at varying times. Most finish around < 10 seconds or so. I don't need all 20, just 10 detected. Once I get to 10, I would like to kill the threads, or to continue on to the next step.
I have tried using set_time_limit to about 20 seconds for each of the threads, but they ignore it and keep running. I am looping through the jobs looking for the join because I didn't want the rest of the program to run but I'm stuck until the slowest one has finished. While pthreads has reduced the time from around a minute to about 30 seconds, I can shave even more time since the first 10 run in about 3 seconds.
Thanks for any help and here is my code:
$count = 0;
foreach ( $array as $i ) {
$imgName = $this->smsId."_$count.jpg";
$name = "LocalCDN/".$imgName;
$stack[] = new AsyncImageModify($i['largePic'], $name);
$count++;
}
// Run the threads
foreach ( $stack as $t ) {
$t->start();
}
// Check if the threads have finished; push the coordinates into an array
foreach ( $stack as $t ) {
if($t->join()){
array_push($this->imgArray, $t->data);
}
}
class class AsyncImageModify extends \Thread{
public $data;
public function __construct($arg, $name, $container) {
$this->arg = $arg;
$this->name = $name;
}
public function run() {
//tried putting the set_time_limit() here, didn't work
if ($this->arg) {
// Get the image
$didWeGetTheImage = Image::getImage($this->arg, $this->name);
if($didWeGetTheImage){
$timestamp1 = microtime(true);
print_r("Starting face detection $this->arg" . "\n");
print_r(" ");
$j = Image::process1($this->name);
if($j){
// lets go ahead and do our image manipulation at this point
$userPic = Image::process2($this->name, $this->name, 200, 200, false, $this->name, $j);
if($userPic){
$this->data = $userPic;
print_r("Back from process2; the image returned is $userPic");
}
}
$endTime = microtime(true);
$td = $endTime-$timestamp1;
print_r("Finished face detection $this->arg in $td seconds" . "\n");
print_r($j);
}
}
}
It is difficult to guess the functionality of Image::* methods, so I can't really answer in any detail.
What I can say, is that there are very few machines I can think of that are suitable to run 20 concurrent threads in any case. A more suitable setup would be the worker/stackable model. A Worker thread is a reuseable context, and can execute task after task, implemented as Stackables; execution in a multi-threaded environment should always use the least amount of threads to get the most work done possible.
Please see pooling example and other examples that are distributed with pthreads, available on github, additionally, much information regarding usage is contained in past bug reports, if you are still struggling after that ...
Very simply, I have a program that needs to perform a large process (anywhere from 5 seconds to several minutes) and I don't want to make my page wait for the process to finish to load.
I understand that I need to run this gearman job as a background process but I'm struggling to identify the proper solution to get real-time status updates as to when the worker actually finishes the process. I've used the following code snippet from the PHP examples:
do {
sleep(3);
$stat = $gmclient->jobStatus($job_handle);
if (!$stat[0]) // the job is known so it is not done
$done = true;
echo "Running: " . ($stat[1] ? "true" : "false") . ", numerator: " . $stat[2] . ", denomintor: " . $stat[3] . "\n";
} while(!$done);
echo "done!\n";
and this works, however it appears that it just returns data to the client when the worker finished telling the job what to do. Instead I want to know when the literal process of the job finished.
My real-life example:
Pull several data feeds from an API (some feeds take longer than others)
Load a couple of the ones that always load fast, place a "Waiting/Loading" animation on the section that was sent off to a worker queue
When the work is done and the results have been completely retrieved, replace the animation with the results
This is a bit late, but I stumbled across this question looking for the same answer. I was able to get a solution together, so maybe it will help someone else.
For starters, refer to the documentation on GearmanClient::jobStatus. This will be called from the client, and the function accepts a single argument: $job_handle. You retrieve this handle when you dispatch the request:
$client = new GearmanClient( );
$client->addServer( '127.0.0.1', 4730 );
$handle = $client->doBackground( 'serviceRequest', $data );
Later on, you can retrieve the status by calling the jobStatus function on the same $client object:
$status = $client->jobStatus( $handle );
This is only meaningful, though, if you actually change the status from within your worker with the sendStatus method:
$worker = new GearmanWorker( );
$worker->addFunction( 'serviceRequest', function( $job ) {
$max = 10;
// Set initial status - numerator / denominator
$job->sendStatus( 0, $max );
for( $i = 1; $i <= $max; $i++ ) {
sleep( 2 ); // Simulate a long running task
$job->sendStatus( $i, $max );
}
return GEARMAN_SUCCESS;
} );
while( $worker->work( ) ) {
$worker->wait( );
}
In versions of Gearman prior to 0.5, you would use the GearmanJob::status method to set the status of a job. Versions 0.6 to current (1.1) use the methods above.
See also this question: Problem With Gearman Job Status
I recently switched from a shared to dedicated host giving me alot more monitoring/control. I've been trying to debug an issue I've had since before I switched, very high memory usage. I think I've narrowed it down to a specific script that is a subscription to an instagram feed/api. It works in a codeIgniter framework.
This is a screenshot of my processes. Note the really high httpd memory values
Here's my controller in codeIgniter
class Subscribe extends CI_Controller {
function __construct() {
parent::__construct();
$this->instagram_api->access_token = 'hidden';
}
function callback()
{
//echo anchor('logs/activity.log', 'LOG');
$min_id = '';
$next_min_id = '';
$this->load->model('Subscribe_model');
$min_id = $this->Subscribe_model->min_id();
echo $min_id;
$pugs = $this->instagram_api->tagsRecent('tagg','',$min_id);
if($pugs){
if (property_exists($pugs->pagination, 'min_tag_id')) {
$next_min_id = $pugs->pagination->min_tag_id;
}
foreach($pugs as $pug) {
if(is_array($pug)) {
foreach($pug as $media) {
$url = $media->images->standard_resolution->url;
$m_id = $media->id;
$c_time = $media->created_time;
$user = $media->user->username;
$filter = $media->filter;
$comments = $media->comments->count;
$caption = $media->caption->text;
$link = $media->link;
$low_res=$media->images->low_resolution->url;
$thumb=$media->images->thumbnail->url;
$lat = $media->location->latitude;
$long = $media->location->longitude;
$loc_id = $media->location->id;
$date = new DateTime('2000-01-01', new DateTimeZone('Pacific/Nauru'));
$data = array(
'media_id' => $m_id,
'min_id' => $next_min_id,
'url' => $url,
'c_time' => $c_time,
'user' => $user,
'filter' => $filter,
'comment_count' => $comments,
'caption' => $caption,
'link' => $link,
'low_res' => $low_res,
'thumb' => $thumb,
'lat' => $lat,
'long' => $long,
'loc_id' => $loc_id,
);
$this->Subscribe_model->add_pug($data);
}
}
}
}
and here is the model....
class Subscribe_model extends CI_Model {
function min_id(){
$this->db->order_by("c_time", "desc");
$query = $this->db->get("pugs");
if ($query->num_rows() > 0)
{
$row = $query->row();
$min_id = $row->min_id;
if(!$min_id){
$min_id ='';
}
}
return $min_id;
}
function add_pug($data){
$query = $this->db->get_where('pugs', array('media_id'=>$data['media_id']));
if($query->num_rows() > 0){
return FALSE;
}else{
$this->db->insert('pugs', $data);
}
}
}
//============================EDIT========================//
I've converted some of the services over to fast-cgi and it seems to have brought my memory usage down significantly but I've noticed a bump in CPU. I was hoping that switching to a dedicated server would have far less headaches and make things much easier but it's been a nightmare so far. Affraid I've bit off more than I can chew.
Another fear of mine is adding some more domain names to the server. Will that add a new process that will run real high like the multiple php-cgi's running in the last image?
Here's my most recent outputs...
To ensure there is no true memory leakage, try having nothing but the httpd/myslqd running on the server (killall Xorg / telinit 3) and then stop the mentioned two services. Note down output from free|grep Mem |sed 's/\([^0-9]*[^\ ]*\)\{3\}\([^\ ]*\).*/\1/'. This is X free bytes of RAM. Now, start httpd/mysqld services and let them run for a few hundred requests. Stop the services and note down numbers again, repeat until satisfied with median results.
It is not uncommon for httpd to consume a lot of RAM. Also mysqld caches in-memory. This is simply because, if the same request is encountered consequtive times (in a row) then the static caches are allready buffered and good-to-go.
For PHP a Class is pre-compiled and once the system needs it after first compile, it will not have to interpret the script line-by-line, it will have a bytecode encoded object to work with. The compilation is off course recompiled if fstat.mtime > bytecode.mtime..
You can analyse real-memory usage the non-swapped kind with this command:
ps -ylC httpd --sort:rss
Child process size for serving static file is about 2-3M. For dynamic content such as PHP, it may be around 15M
To configure how apache sets up the workers, these parameters are valid in a httpd.conf:
StartServers,
MaxClients,
MinSpareThreads,
MaxSpareThreads,
ThreadsPerChild,
MaxRequestsPerChild
Check this link: http://www.howtoforge.com/configuring_apache_for_maximum_performance
Section 3.5:
The MaxClients sets the limit on maximum simultaneous requests that
can be supported by the server. No more than this much number of child
processes are spawned. It shouldn't be set too low such that new
connections are put in queue, which eventually time-out and the server
resources are left unused. Setting this too high will cause the server
to start swapping and the response time will degrade drastically.
Appropriate value for MaxClients can be calculated as: MaxClients =
Total RAM dedicated to the web server / Max child process size
Apache performance tuning is available here http://httpd.apache.org/docs/2.0/misc/perf-tuning.html, skip down to section about Process creation for more info on above mentioned config options.
I'd like to create a php script that runs as a daily cron. What I'd like to do is enumerate through all users within an Active Directory, extract certain fields from each entry, and use this information to update fields within a MySQL database.
Basically what I want to to do is sync up certain user information between Active Directory and a MySQL table.
The problem I have is that the sizelimit on the Active Directory server is often set at 1000 entries per search result. I had hoped that the php function "ldap_next_entry" would get around this by only fetching one entry at a time, but before you can call "ldap_next_entry", you first have to call "ldap_search", which can trigger the SizeLimit exceeded error.
Is there any way besides removing the sizelimit from the server? Can I somehow get "pages" of results?
BTW - I am currently not using any 3rd party libraries or code. Just PHPs ldap methods. Although, I am certainly open to using a library if that will help.
I've been struck by the same problem while developing Zend_Ldap for the Zend Framework. I'll try to explain what the real problem is, but to make it short: until PHP 5.4, it wasn't possible to use paged results from an Active Directory with an unpatched PHP (ext/ldap) version due to limitations in exactly this extension.
Let's try to unravel the whole thing... Microsoft Active Directory uses a so called server control to accomplish server-side result paging. This control ist described in RFC 2696 "LDAP Control Extension for Simple Paged Results Manipulation" .
ext/php offers an access to LDAP control extensions via its ldap_set_option() and the LDAP_OPT_SERVER_CONTROLS and LDAP_OPT_CLIENT_CONTROLS option respectively. To set the paged control you do need the control-oid, which is 1.2.840.113556.1.4.319, and we need to know how to encode the control-value (this is described in the RFC). The value is an octet string wrapping the BER-encoded version of the following SEQUENCE (copied from the RFC):
realSearchControlValue ::= SEQUENCE {
size INTEGER (0..maxInt),
-- requested page size from client
-- result set size estimate from server
cookie OCTET STRING
}
So we can set the appropriate server control prior to executing the LDAP query:
$pageSize = 100;
$pageControl = array(
'oid' => '1.2.840.113556.1.4.319', // the control-oid
'iscritical' => true, // the operation should fail if the server is not able to support this control
'value' => sprintf ("%c%c%c%c%c%c%c", 48, 5, 2, 1, $pageSize, 4, 0) // the required BER-encoded control-value
);
This allows us to send a paged query to the LDAP/AD server. But how do we know if there are more pages to follow and how do we specify with which control-value we have to send our next query?
This is where we're getting stuck... The server responds with a result set that includes the required paging information but PHP lacks a method to retrieve exactly this information from the result set. PHP provides a wrapper for the LDAP API function ldap_parse_result() but the required last parameter serverctrlsp is not exposed to the PHP function, so there is no way to retrieve the required information. A bug report has been filed for this issue but there has been no response since 2005. If the ldap_parse_result() function provided the required parameter, using paged results would work like
$l = ldap_connect('somehost.mydomain.com');
$pageSize = 100;
$pageControl = array(
'oid' => '1.2.840.113556.1.4.319',
'iscritical' => true,
'value' => sprintf ("%c%c%c%c%c%c%c", 48, 5, 2, 1, $pageSize, 4, 0)
);
$controls = array($pageControl);
ldap_set_option($l, LDAP_OPT_PROTOCOL_VERSION, 3);
ldap_bind($l, 'CN=bind-user,OU=my-users,DC=mydomain,DC=com', 'bind-user-password');
$continue = true;
while ($continue) {
ldap_set_option($l, LDAP_OPT_SERVER_CONTROLS, $controls);
$sr = ldap_search($l, 'OU=some-ou,DC=mydomain,DC=com', 'cn=*', array('sAMAccountName'), null, null, null, null);
ldap_parse_result ($l, $sr, $errcode, $matcheddn, $errmsg, $referrals, $serverctrls); // (*)
if (isset($serverctrls)) {
foreach ($serverctrls as $i) {
if ($i["oid"] == '1.2.840.113556.1.4.319') {
$i["value"]{8} = chr($pageSize);
$i["iscritical"] = true;
$controls = array($i);
break;
}
}
}
$info = ldap_get_entries($l, $sr);
if ($info["count"] < $pageSize) {
$continue = false;
}
for ($entry = ldap_first_entry($l, $sr); $entry != false; $entry = ldap_next_entry($l, $entry)) {
$dn = ldap_get_dn($l, $entry);
}
}
As you see there is a single line of code (*) that renders the whole thing useless. On my way though the sparse information on this subject I found a patch against the PHP 4.3.10 ext/ldap by IƱaki Arenaza but neither did I try it nor do I know if the patch can be applied on a PHP5 ext/ldap. The patch extends ldap_parse_result() to expose the 7th parameter to PHP:
--- ldap.c 2004-06-01 23:05:33.000000000 +0200
+++ /usr/src/php4/php4-4.3.10/ext/ldap/ldap.c 2005-09-03 17:02:03.000000000 +0200
## -74,7 +74,7 ##
ZEND_DECLARE_MODULE_GLOBALS(ldap)
static unsigned char third_argument_force_ref[] = { 3, BYREF_NONE, BYREF_NONE, BYREF_FORCE };
-static unsigned char arg3to6of6_force_ref[] = { 6, BYREF_NONE, BYREF_NONE, BYREF_FORCE, BYREF_FORCE, BYREF_FORCE, BYREF_FORCE };
+static unsigned char arg3to7of7_force_ref[] = { 7, BYREF_NONE, BYREF_NONE, BYREF_FORCE, BYREF_FORCE, BYREF_FORCE, BYREF_FORCE, BYREF_FORCE };
static int le_link, le_result, le_result_entry, le_ber_entry;
## -124,7 +124,7 ##
#if ( LDAP_API_VERSION > 2000 ) || HAVE_NSLDAP
PHP_FE(ldap_get_option, third_argument_force_ref)
PHP_FE(ldap_set_option, NULL)
- PHP_FE(ldap_parse_result, arg3to6of6_force_ref)
+ PHP_FE(ldap_parse_result, arg3to7of7_force_ref)
PHP_FE(ldap_first_reference, NULL)
PHP_FE(ldap_next_reference, NULL)
#ifdef HAVE_LDAP_PARSE_REFERENCE
## -1775,14 +1775,15 ##
Extract information from result */
PHP_FUNCTION(ldap_parse_result)
{
- pval **link, **result, **errcode, **matcheddn, **errmsg, **referrals;
+ pval **link, **result, **errcode, **matcheddn, **errmsg, **referrals, **serverctrls;
ldap_linkdata *ld;
LDAPMessage *ldap_result;
+ LDAPControl **lserverctrls, **ctrlp, *ctrl;
char **lreferrals, **refp;
char *lmatcheddn, *lerrmsg;
int rc, lerrcode, myargcount = ZEND_NUM_ARGS();
- if (myargcount 6 || zend_get_parameters_ex(myargcount, &link, &result, &errcode, &matcheddn, &errmsg, &referrals) == FAILURE) {
+ if (myargcount 7 || zend_get_parameters_ex(myargcount, &link, &result, &errcode, &matcheddn, &errmsg, &referrals, &serverctrls) == FAILURE) {
WRONG_PARAM_COUNT;
}
## -1793,7 +1794,7 ##
myargcount > 3 ? &lmatcheddn : NULL,
myargcount > 4 ? &lerrmsg : NULL,
myargcount > 5 ? &lreferrals : NULL,
- NULL /* &serverctrls */,
+ myargcount > 6 ? &lserverctrls : NULL,
0 );
if (rc != LDAP_SUCCESS ) {
php_error(E_WARNING, "%s(): Unable to parse result: %s", get_active_function_name(TSRMLS_C), ldap_err2string(rc));
## -1805,6 +1806,29 ##
/* Reverse -> fall through */
switch(myargcount) {
+ case 7 :
+ zval_dtor(*serverctrls);
+
+ if (lserverctrls != NULL) {
+ array_init(*serverctrls);
+ ctrlp = lserverctrls;
+
+ while (*ctrlp != NULL) {
+ zval *ctrl_array;
+
+ ctrl = *ctrlp;
+ MAKE_STD_ZVAL(ctrl_array);
+ array_init(ctrl_array);
+
+ add_assoc_string(ctrl_array, "oid", ctrl->ldctl_oid,1);
+ add_assoc_bool(ctrl_array, "iscritical", ctrl->ldctl_iscritical);
+ add_assoc_stringl(ctrl_array, "value", ctrl->ldctl_value.bv_val,
+ ctrl->ldctl_value.bv_len,1);
+ add_next_index_zval (*serverctrls, ctrl_array);
+ ctrlp++;
+ }
+ ldap_controls_free (lserverctrls);
+ }
case 6 :
zval_dtor(*referrals);
if (array_init(*referrals) == FAILURE) {
Actually the only option left would be to change the Active Directory configuration and raise the maximum result limit. The relevant option is called MaxPageSize and can be altered by using ntdsutil.exe - please see "How to view and set LDAP policy in Active Directory by using Ntdsutil.exe".
EDIT (reference to COM):
Or you can go the other way round and use the COM-approach via ADODB as suggested in the link provided by eykanal.
Support for paged results was added in PHP 5.4.
See ldap_control_paged_result for more details.
This isn't a full answer, but this guy was able to do it. I don't understand what he did, though.
By the way, a partial answer is that you CAN get "pages" of results. From the documentation:
resource ldap_search ( resource $link_identifier , string $base_dn ,
string $filter [, array $attributes [, int $attrsonly [, int $sizelimit [,
int $timelimit [, int $deref ]]]]] )
...
sizelimit Enables you to limit the count of entries fetched. Setting this to 0 means no limit.
Note: This parameter can NOT override server-side preset sizelimit.
You can set it lower though. Some directory server hosts will be
configured to return no more than a preset number of entries. If this
occurs, the server will indicate that it has only returned a partial
results set. This also occurs if you use this parameter to limit the
count of fetched entries.
I don't know how to specify that you want to search STARTING from a certain position, though. I.e., after you get your first 1000, I don't know how to specify that now you need the next 1000. Hopefully someone else can help you there :)
Here's an alternative (which works pre PHP 5.4). If you have 10,000 records you need to get but your AD server only returns 5,000 per page:
$ldapSearch = ldap_search($ldapResource, $basedn, $filter, array('member;range=0-4999'));
$ldapResults = ldap_get_entries($dn, $ldapSearch);
$members = $ldapResults[0]['member;range=0-4999'];
$ldapSearch = ldap_search($ldapResource, $basedn, $filter, array('member;range=5000-10000'));
$ldapResults = ldap_get_entries($dn, $ldapSearch);
$members = array_merge($members, $ldapResults[0]['member;range=5000-*']);
I was able to get around the size limitation using ldap_control_paged_result
ldap_control_paged_result is used to Enable LDAP pagination by sending the pagination control. The below function worked perfectly in my case.
function retrieves_users($conn)
{
$dn = 'ou=,dc=,dc=';
$filter = "(&(objectClass=user)(objectCategory=person)(sn=*))";
$justthese = array();
// enable pagination with a page size of 100.
$pageSize = 100;
$cookie = '';
do {
ldap_control_paged_result($conn, $pageSize, true, $cookie);
$result = ldap_search($conn, $dn, $filter, $justthese);
$entries = ldap_get_entries($conn, $result);
if(!empty($entries)){
for ($i = 0; $i < $entries["count"]; $i++) {
$data['usersLdap'][] = array(
'name' => $entries[$i]["cn"][0],
'username' => $entries[$i]["userprincipalname"][0]
);
}
}
ldap_control_paged_result_response($conn, $result, $cookie);
} while($cookie !== null && $cookie != '');
return $data;
}