How to prin RRD Graph data with PHP

How to prin RRD Graph data with PHP - php

Thanks in advance for your time and effort. This is my first script with PHP and RRD's. While I was writing a short program for SNMP I came across with RRD a powerful tool for graphical output. I tried to wright a short operating script to imitate the output of a graph. I read as much as possible documentations regarding RRD from the official page and tried to add them to my PHP code. I have found some functions while I was trying to debug my code which are showing that my data are coming in normally and cased on reference as expected. Sample provided under:
["last_update"]=>
int(1396917542)
["ds_cnt"]=>
int(3)
["ds_navm"]=>
array(3) {
[0]=>
string(10) "ifInOctets"
[1]=>
string(11) "ifOutOctets"
[2]=>
string(9) "sysUpTime"
}
["data"]=>
array(3) {
[0]=>
string(4) "1405"
[1]=>
string(4) "1219"
[2]=>
string(4) "1893"
}
}
Based on function:
"$debug = rrd_lastupdate (
"".$rrdFile.""
);"
I am having difficulties to understand since the input is coming correctly and there is not print error displayed while compiling my code why I do not get any output?
I have included my working code as example for possible reproduction and better understanding of my error.
<?php
// rrdtool info /var/www/snmp.rrd Debugging command
while (1) {
sleep (1);
$file = "snmp";
$rrdFile = dirname(__FILE__) . "/".$file.".rrd";
$in = "ifInOctets";
$out = "ifOutOctets";
$count = "sysUpTime";
$options = array(
"--start","now -10s", // Now -10 seconds (default)
"--step", "10", // Step size of 300 seconds 5 minutes
"DS:".$in.":COUNTER:20:U:U",
"DS:".$out.":COUNTER:20:U:U",
"DS:".$count.":COUNTER:20:U:U",
/* DS:ds-name:DST:dst arguments
(DST: GAUGE, COUNTER, DERIVE, and ABSOLUTE):heartbeat:min:max
heartbeat: in case that there is not input up to 600 seconds
then the input will characterised as undefined (blank)
Based on Nyquist rate (Fs >= 2 * Fmax) 300 (step) 600 (heartbeat)
32-bit = 2^32-1 = 4294967295 (counter ticks)
64-bit = 2^64-1 = 18446744073709551615 (counter ticks)
64-bit counter (caution!!!) different oid's
*/
"RRA:MIN:0.5:10:300",
"RRA:MIN:0.5:20:600",
"RRA:MAX:0.5:10:300",
"RRA:MAX:0.5:20:600",
"RRA:AVERAGE:0.5:10:300",
"RRA:AVERAGE:0.5:20:600",
/* RRA:AVERAGE | MIN | MAX | LAST:xff:steps:rows
xff range: 0-1 (exclusive) defines the allowed number of unknown
*UNKNOWN* PDPs to the number of PDPs in the interval. Step defines
how many of those data points are used to build consolidated data.
rows defines how many data values are kept in an RRA.
*/
);
//create rrd file
$create = rrd_create(
"".$rrdFile."",
$options
);
if ($create === FALSE) {
echo "Creation error: ".rrd_error()."\n";
}
$ifInOctets = rand(0, 1500); // ifInOctets (OID: 1.3.6.1.2.1.2.2.1.10)
$ifOutOctets = rand(0, 2500); // ifOutOctets (OID: 1.3.6.1.2.1.2.2.1.16)
$sysUpTime = rand(0, 2000); // sysUpTime (OID: 1.3.6.1.2.1.1.3)
$t = time();
//update rrd file
$update = rrd_update(
"".$rrdFile."",
array(
/* Update database with 3 values
based on time now (N:timestamp) */
"".$t.":".$ifInOctets.":".$ifOutOctets.":".$sysUpTime.""
)
);
if ($update === FALSE) {
echo "Update error: ".rrd_error()."\n";
}
$start = "now";
$title = "Hourly Server Data";
$final = array(
"--start","".$start." -10s",
"--step","10",
"--title=".$title."",
"--vertical-label=Bytes/sec",
"--lower-limit=0",
//"--no-gridfit",
"--slope-mode",
//"--imgformat","EPS",
"DEF:".$in."_def=".$file.".rrd:".$in.":AVERAGE",
"DEF:".$out."_def=".$file.".rrd:".$out.":AVERAGE",
"DEF:".$count."_def=".$file.".rrd:".$count.":AVERAGE",
"CDEF:inbits=".$in."_def,8,*",
"CDEF:outbits=".$out."_def,8,*",
"CDEF:counter=".$count."_def,8,*",
/* "VDEF:".$in_min."=inbits,MINIMUM",
"VDEF:".$out_min."=outbits,MINIMUM",
"VDEF:".$in_max."=inbits,MAXIMUM",
"VDEF:".$out_max."=outbits,MAXIMUM",
"VDEF:".$in_av."=inbits,AVERAGE",
"VDEF:".$out_av."=outbits,AVERAGE", */
"COMMENT:\\n",
"LINE:".$in."_def#FF00FF:".$in."",
"GPRINT:inbits:LAST:last \: %6.2lf %SBps",
"COMMENT:\\n",
"LINE:".$out."_def#0000FF:".$out."",
"GPRINT:outbits:LAST:last \: %6.2lf %SBps",
"COMMENT:\\n",
"LINE:".$count."_def#FFFF00:".$count."",
"GPRINT:counter:LAST:last\: %6.2lf %SBps",
"COMMENT:\\n",
);
// graph output
$outputPngFile = rrd_graph(
"".$file.".png",
$final
);
if ($outputPngFile === FALSE) {
echo "<b>Graph error: </b>".rrd_error()."\n";
}
/* Returns the first data sample from the
specified RRA of the RRD file. */
$result = rrd_first (
$rrdFile,
$raaindex = 0
);
if ($result === FALSE) {
echo "<b>Graph result error: </b>".rrd_error()."\n";
}
/* Returns the UNIX timestamp of the most
recent update of the RRD database. */
$last = rrd_last (
$rrdFile
);
if ($last === FALSE) {
echo "<b>Graph result error: </b>".rrd_error()."\n";
}
$info = rrd_info (
"".$rrdFile.""
);
if ($info === FALSE) {
echo "<b>Graph result error: </b>".rrd_error()."\n";
}
/* Gets array of the UNIX timestamp and the values stored
for each date in the most recent update of the RRD database file. */
$debug = rrd_lastupdate (
"".$rrdFile.""
);
if ($debug === FALSE) {
echo "<b>Graph result error: </b>".rrd_error()."\n";
}
var_dump ($debug);
/* $version = rrd_version ( );
echo "This is the version ".$version."\n"; */
} // End of while condition
?>

There are a lot of issues in your code.
Firstly, your RRD file is apparently re-created in every iteration of your While loop! This will overwrite the previous loop's update.
Secondly, although you create the RRD with a step size of 10, you are not doing a sleep(10) at the end of each update loop. You cannot update the RRD more frequently than the Step size.
Thirdly, you have used a COUNTER type for the DS, which assumes an constantly increasing count. However your test data are random numbers and so not increasing. A decrease may be taken as a counter wraparound, so a huge number, which is outside the valid range for your DS - and hence an Unknown is stored.
Fourthly, you need to have two consecutive updates for a Counter to have a valid rate; you are overwriting your RRD file each iteration and so never manage to get this.
Fifthly, your smallest RRA being defined has a COunt of 10; this means you need 10 data points to make a single consolidated pointin the RRA. With a Step of 10sec, this means you need to run for 110sec (11 updates) before you even have a single data point to plot. You should probably try adding a Count 1 RRA.
And finally, your graph is being requested for a 10 second time window. This is less than one RRA sample. Remember, you r Step is 10s and your smallest RRA is count=10, so a consolidated step of 100s.
I suggest you fix the loop so that the create comes outside; make the sleep equivalent to the RRD step; add a Count 1 AVG RRA; and make your graph request for a longer time window.

Related

How to count JSON response and output how many lines there are

Hello fellow developers,
I have been trying to manipulate the output and display the total amount of workers there are instead of outputting the workers name as a string.
Bellow you will find the data that i am receiving and further down i will explain how i would like to handle the JSON response.
{
"result":
{
"addr":"ADDRESS_HERE",
"workers":
[
["worker1080",{},2,1,"200000",0,22],
["worker1080",{"a":"899.4"},3,1,"512",0,24]
],
"algo":-1
},
"method":"stats.provider.workers"
}
So basically as you can see from the above response that there are 2 workers named "worker1080" active on that address.
The bellow php code is how i retrieve the data and output only the names of the workers:
<?php
$btcwallet = get_btc_addy();
if (isset($cur_addy)) {
$method4 = new methods();
$worker_stats = new urls();
$get_data = file_get_contents(utf8_encode($worker_stats->nice_url.$method4->m4.$cur_addy));
$get_json = json_decode($get_data, true);
foreach ($get_json['result']['workers'] as $v) {
$i = 0;
print $v[$i++]."<br />";
}
}
?>
$get_json is the variable that decodes the data from $get_data and displays the worker names and increments every time a worker is added or online.
now i currently have 2 workers online as shown in the JSON response.
it outputs:
worker1080
worker1080
which is perfect although if i try using a foreach statement and try to display the the total amount of workers online it should display 2 instead of the names, it has to also increment for each worker that the json repsonse outputs.
EG: i have 2 workers online now, but in an hour i will connect 10 more it would display the following:
worker1080
worker1080
worker1070
worker1070
worker1080ti
worker1080ti
workerASIC
workerASIC
workerASIC
workerCPU
workerCPU
workerCPU
Now i try to use the following to display the total:
count($v[$i++]);
and i have tried using a foreach within the foreach, and both count and the foreach both will either display "0" by all the workers or "1"
bellow is an example of the output.
0
0
0
0
0
How would i go about counting each line in the output and display the total number of workers ?

Thanks #symcbean for the solution.
<?php
$btcwallet = get_btc_addy();
if (isset($cur_addy)) {
$method4 = new methods();
$worker_stats = new urls();
$get_data = file_get_contents(utf8_encode($worker_stats->nice_url.$method4->m4.$cur_addy));
$get_json = json_decode($get_data, true);
print count($get_json['result']['workers'])."<br />"; // <-- solution *removed foreach and $i incrementation as its not needed for count
}
?>
it now displays the correct number of workers :)

Running PHP & Mysqli queries in Parallel

I'm tying to extract data from thousands of premade sql files. I have a script that does what I need using the Mysqli driver in PHP, but it's really slow since it's one sql file at a time. I modified the script to create unique temp database names, which each sql file is loaded into. Data is extracted to an archive database table, then the temp database is dumped. In an effort to speed things up, I created a script structured 4 scripts similar to the one below, where each for loop is stored in it's own unique PHP file (the code below is only for a quick demo of what's going on in 4 separate files), they are setup to grab only 1/4 of the files from the source file folder. All of this works perfectly, the scripts run, there is zero interference with file handling. The issue is that I seem to get almost zero performance boost. Maybe 10 seconds faster :( I quickly refreshed my PHPmyadmin database listing page and could see the 4 different databases loaded at anytime, but I also noticed that it looked like it was still running more or less sequentially as the DB names were changing on the fly. I went the extra step of creating an unique user for each script with it's own connection. No improvement. Can I get this to work with mysqli / PHP or do I need to look into some other options? I'd prefer to do this all in PHP if I can (version 7.0). I tested by running the PHP scripts in my browser. Is that the issue? I haven't written any code to execute them on the command line and set them to the background yet. One last note, all the users in my mysql database have no limits on connections, etc.
$numbers = array('0','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20');
$numCount = count($numbers);
$a = '0';
$b = '1';
$c = '2';
$d = '3';
$rebuild = array();
echo"<br>";
for($a; $a <= $numCount; $a+=4){
if(array_key_exists($a, $numbers)){
echo $numbers[$a]."<br>";
}
}
echo "<br>";
for($b; $b <= $numCount; $b+=4){
if(array_key_exists($b, $numbers)){
echo $numbers[$b]."<br>";
}
}
echo "<br>";
for($c; $c <= $numCount; $c+=4){
if(array_key_exists($c, $numbers)){
echo $numbers[$c]."<br>";
}
}
echo "<br>";
for($d; $d <= $numCount; $d+=4){
if(array_key_exists($d, $numbers)){
echo $numbers[$d]."<br>";
}
}

Try this:
<?php
class BackgroundTask extends Thread {
public $output;
protected $input;
public function run() {
/* Processing here, use $output for... well... outputting data */
// Here you would implement your for() loops, for example, using $this->input as their data
// Some dumb value to demonstrate
$output = "SOME DATA!";
}
function __construct($input_data) {
$this->input = $input_data;
}
}
// Create instances with different input data
// Each "quarter" will be a quarter of your data, as you're trying to do right now
$job1 = new BackgroundTask($first_quarter);
$job1->start();
$job2 = new BackgroundTask($second_quarter);
$job2->start();
$job3 = new BackgroundTask($third_quarter);
$job3->start();
$job4 = new BackgroundTask($fourth_quarter);
$job4->start();
// ==================
// "join" the first job, i.e. "wait until it's finished"
$job1->join();
echo "First output: " . $job1->output;
$job2->join();
echo "Second output: " . $job2->output;
$job3->join();
echo "Third output: " . $job3->output;
$job4->join();
echo "Fourth output: " . $job4->output;
?>
When using four calls to your own script through HTTP, you're overloading your connections for no useful reason. Instead, you're taking away spaces for other users who may be trying to access your website.

Comparing two large files are taking over four hours

I have an online store that has about 15,000 products that get's updated everyday. Currently I upload the new list everyday, but it poses some issues (like downtime being a huge issue) and I wanted to come up with an alternative.
I created a script that moves the "yesterdays" products list and downloads today's products list. Then I go line-by-line and compare the two files seeing what needs to be deleted, modified, are created. This will allow me to perform an update with minimal amount of work, no downtime since everything will happen behind the scenes via CRON job, and it's how it should done.
The problem I have is it takes over four hours for the process to happen and I'm not sure if what I'm doing is the most efficient way. My first thought is to write something in C++, but I'm not sure how much faster that would be compared to PHP.
My question(s) is:
• Is this the most efficient way to do this?
• Is PHP the best language to do this?
Here's my script I wrote that handles the download and comparison:
public function __construct($url, $user, $pass)
{
$this->logger = new KLogger("/opt/lampp/htdocs/lea/logs/master.log" , KLogger::INFO);
/* increase execution time and server memory limit */
ini_set('max_execution_time', 14400);
ini_set('memory_limit', '-1');
/* set veriables */
$this->ftp = ftp_connect($url);
$this->login = ftp_login($this->ftp, $user, $pass);
$this->old = file('/opt/lampp/htdocs/lea/products/new/temp/rsr_inventory.txt');
$this->new = file('/opt/lampp/htdocs/lea/products/new/rsr_inventory.txt');
$this->list = array();
$this->start_time = date('Hi');
$this->counter = 0;
}
public function download($to, $from)
{
// move current file to new location to get new file ready
$this->logger->LogInfo('move yesterday\'s products list');
rename('/opt/lampp/htdocs/lea/products/new/temp/rsr_inventory.txt', '/opt/lampp/htdocs/lea/products/new/rsr_inventory.txt');
// get list from rsr
$this->logger->LogInfo('get new list from rsr');
if(ftp_get($this->ftp, $to, $from, FTP_BINARY))
{
return true;
}
return false;
}
public function update()
{
// initialize process
$this->logger->LogInfo('update process initialized');
for($i = 0; $i < count($this->new); $i++)
{
$new[$i] = explode(';', $this->new[$i]);
$response = $this->_match($new[$i]);
if($response[0])
{
if(trim($response[2]) != trim($new[$i][5]) || trim($response[3]) != trim($new[$i][8]))
{
$this->list[$this->counter][0] = $response[1];
$this->list[$this->counter][1] = 'update';
$this->list[$this->counter][2] = trim($response[2]);
$this->list[$this->counter][3] = trim($response[3]);
$this->counter++;
}
}
else
{
$this->list[$this->counter][0] = $response[1];
$this->list[$this->counter][1] = 'create';
$this->list[$this->counter][2] = trim($response[2]);
$this->list[$this->counter][3] = trim($response[3]);
$this->counter++;
}
}
if(count($this->list) > 0)
{
//csv
$this->logger->LogInfo('create update.csv');
$updates = fopen('/opt/lampp/htdocs/lea/products/new/updates.csv', 'w');
foreach($this->list as $fields)
{
fputcsv($updates, $fields);
}
fclose($updates);
}
$this->logger->LogInfo('product update process complete');
$this->__mail();
}
private function _match($item)
{
for($j = 0; $j < count($this->old); $j++)
{
$old[$j] = explode(';', $this->old[$j]);
if($item[0] === $old[$j][0])
{
return array(true, $item[0], $old[$j][5], $old[$j][8]);
}
}
return array(false, NULL, NULL, NULL);
}
Here is an example of the products.txt file I get everyday (I'm only showing 10 products, but there are roughly 15,000 (there is a lot of things missing; prices, qty, and etc..., but I shortened everything up since it doesn't matter to show those) :
511-10010-019-L-XL;844802282208;5.11 RECON ANKLE SOCK BLK L/XL;
511-10010-036-L-XL;844802282246;5.11 RECON ANKLE SOCK SHADOW L/XL;
511-10010-132-LXL;844802334662;5.11 RECON ANKLE SOCK TIMBER L/XL;
511-10010-200-L-XL;844802282222;5.11 RECON ANKLE SOCK FATIGUE L/XL;
511-10011-019-L-XL;844802276382;5.11 COLD WEATHER OTC SOCK BLK L/XL;
511-10012-019-L-XL;844802276429;5.11 COLD WEATHER CREW SOCK BLK L/XL;
511-30012-019-M;844802269650;5.11 WOMENS HOLSTER SHIRT BLK M;
511-40011-010-L;844802016148;5.11 HOLSTER SHIRT L WHITE;
511-40011-010-M;844802016131;5.11 HOLSTER SHIRT M WHITE;
511-40011-010-XL;844802016155;5.11 HOLSTER SHIRT XL WHITE;
511-40011-010-XXL;844802016162;5.11 HOLSTER SHIRT 2XL WHITE;

I think your problem is that you are doing 15000 x 15000 comparisons (so 225 million operations on the data).
If you instead create a map (in other words an array in PHP) with some unique identifier as the index for both the old and the new. That is 30k operations, and then iterate over the one list checking if the other contains the same thing or not. That's another 15K operations. Total of 45K operations, rather than 225M operations.
I'm not saying the suggestion to do a database is a bad idea, but the excessive time it takes is clearly caused by a poor choice of algorithm + data structure.

This is a job for MySQL. Importing your data will be a substantial investment up front, but will be worth it in the long run. Databases are designed to update, merge, delete, and insert data efficiently. This sort of job would take seconds in MySQL. You could keep PHP as your scripting language.

php RRD graph separation

I am trying to create RRD graphs with the help of PHP in order to keep track of the inoctets,outoctets and counter of a server.
So far the script is operating as expected but my problems comes when I am trying to produce 2 or more separate graphs. I am trying to produce (hourly, weekly , etc) graphs. I thought by creating a loop would solve my problem, since I have split the RRA in hours and days. Unfortunately I end up having 2 graphs that updating simultaneously as expected but plotting the same thing. Has any one encounter similar problem? I have applied the same program in perl with RRD::Simple,where is extremely easy and everything is adjusted almost automatically.
I have supplied under a working example of my code with the minimum possible data because the code is a bit long:
<?php
$file = "snmp-2";
$rrdFile = dirname(__FILE__) . "/snmp-2.rrd";
$in = "ifInOctets";
$out = "ifOutOctets";
$count = "sysUpTime";
$step = 5;
$rounds = 1;
$output = array("Hourly","Daily");
while (1) {
sleep (6);
$options = array(
"--start","now -15s", // Now -10 seconds (default)
"--step", "".$step."",
"DS:".$in.":GAUGE:10:U:U",
"DS:".$out.":GAUGE:10:U:U",
"DS:".$count.":ABSOLUTE:10:0:4294967295",
"RRA:MIN:0.5:12:60",
"RRA:MAX:0.5:12:60",
"RRA:LAST:0.5:12:60",
"RRA:AVERAGE:0.5:12:60",
"RRA:MIN:0.5:300:60",
"RRA:MAX:0.5:300:60",
"RRA:LAST:0.5:300:60",
"RRA:AVERAGE:0.5:300:60",
);
if ( !isset( $create ) ) {
$create = rrd_create(
"".$rrdFile."",
$options
);
if ( $create === FALSE ) {
echo "Creation error: ".rrd_error()."\n";
}
}
$t = time();
$ifInOctets = rand(0, 4294967295);
$ifOutOctets = rand(0, 4294967295);
$sysUpTime = rand(0, 4294967295);
$update = rrd_update(
"".$rrdFile."",
array(
"".$t.":".$ifInOctets.":".$ifOutOctets.":".$sysUpTime.""
)
);
if ($update === FALSE) {
echo "Update error: ".rrd_error()."\n";
}
$start = $t - ($step * $rounds);
foreach ($output as $test) {
$final = array(
"--start","".$start." -15s",
"--end", "".$t."",
"--step","".$step."",
"--title=".$file." RRD::Graph",
"--vertical-label=Byte(s)/sec",
"--right-axis-label=latency(min.)",
"--alt-y-grid", "--rigid",
"--width", "800", "--height", "500",
"--lower-limit=0",
"--alt-autoscale-max",
"--no-gridfit",
"--slope-mode",
"DEF:".$in."_def=".$file.".rrd:".$in.":AVERAGE",
"DEF:".$out."_def=".$file.".rrd:".$out.":AVERAGE",
"DEF:".$count."_def=".$file.".rrd:".$count.":AVERAGE",
"CDEF:inbytes=".$in."_def,8,/",
"CDEF:outbytes=".$out."_def,8,/",
"CDEF:counter=".$count."_def,8,/",
"COMMENT:\\n",
"LINE2:".$in."_def#FF0000:".$in."",
"COMMENT:\\n",
"LINE2:".$out."_def#0000FF:".$out."",
"COMMENT:\\n",
"LINE2:".$count."_def#FFFF00:".$count."",
);
$outputPngFile = rrd_graph(
"".$test.".png",
$final
);
if ($outputPngFile === FALSE) {
echo "<b>Graph error: </b>".rrd_error()."\n";
}
} /* End of foreach function */
$debug = rrd_lastupdate (
"".$rrdFile.""
);
if ($debug === FALSE) {
echo "<b>Graph result error: </b>".rrd_error()."\n";
}
var_dump ($debug);
$rounds++;
} /* End of while loop */
?>

A couple of issues.
Firstly, your definition of the RRD has a step of 5seconds and RRAs with steps of 12x5s=1min and 300x5s=25min. They also have a length of only 60 rows, so 1hr and 25hr respectively. You'll never get a weekly graph this way! You need to add more rows; also the step seems rather short, and you might need a smaller-step RRA for hourly graphs and a larger-step one for weekly graphs.
Secondly, it is not clear how you're calling the graph function. You seem to be specifying:
"--start","".$start." -15s",
"--end", "".$t."",
"--step","".$step."",
... which would force it to use the 5s interval (unavailable, so the 1min one would always get used) and for the graph to be only for the time window from the start to the last update, not a 'hourly' or 'daily' as you were asking.
Note that the RRA you have defined do not define the time window of the graph you are asking for. Also, just because you have more than one RRA defined, it doesnt mean you'll get more than one graph unless oyu call the graph function twice with different arguments.
If you want a daily graph, use
"--start","end - 1 hour",
"--end",$t,
Do not specify a step as the most appropriate available will be used anyway. For a daily graph, use
"--start","end - 1 day"
"--end",$t,
Similarly, no need to specify a step.
Hopefully this will make it a little clearer. Most of the RRD graph options have sensible defaults, and RRDTool is pretty good at picking the correct RRA to use based on the graph size, time window, and DEF statements.

Enumerate all users in LDAP with PHP

I'd like to create a php script that runs as a daily cron. What I'd like to do is enumerate through all users within an Active Directory, extract certain fields from each entry, and use this information to update fields within a MySQL database.
Basically what I want to to do is sync up certain user information between Active Directory and a MySQL table.
The problem I have is that the sizelimit on the Active Directory server is often set at 1000 entries per search result. I had hoped that the php function "ldap_next_entry" would get around this by only fetching one entry at a time, but before you can call "ldap_next_entry", you first have to call "ldap_search", which can trigger the SizeLimit exceeded error.
Is there any way besides removing the sizelimit from the server? Can I somehow get "pages" of results?
BTW - I am currently not using any 3rd party libraries or code. Just PHPs ldap methods. Although, I am certainly open to using a library if that will help.

I've been struck by the same problem while developing Zend_Ldap for the Zend Framework. I'll try to explain what the real problem is, but to make it short: until PHP 5.4, it wasn't possible to use paged results from an Active Directory with an unpatched PHP (ext/ldap) version due to limitations in exactly this extension.
Let's try to unravel the whole thing... Microsoft Active Directory uses a so called server control to accomplish server-side result paging. This control ist described in RFC 2696 "LDAP Control Extension for Simple Paged Results Manipulation" .
ext/php offers an access to LDAP control extensions via its ldap_set_option() and the LDAP_OPT_SERVER_CONTROLS and LDAP_OPT_CLIENT_CONTROLS option respectively. To set the paged control you do need the control-oid, which is 1.2.840.113556.1.4.319, and we need to know how to encode the control-value (this is described in the RFC). The value is an octet string wrapping the BER-encoded version of the following SEQUENCE (copied from the RFC):
realSearchControlValue ::= SEQUENCE {
size INTEGER (0..maxInt),
-- requested page size from client
-- result set size estimate from server
cookie OCTET STRING
}
So we can set the appropriate server control prior to executing the LDAP query:
$pageSize = 100;
$pageControl = array(
'oid' => '1.2.840.113556.1.4.319', // the control-oid
'iscritical' => true, // the operation should fail if the server is not able to support this control
'value' => sprintf ("%c%c%c%c%c%c%c", 48, 5, 2, 1, $pageSize, 4, 0) // the required BER-encoded control-value
);
This allows us to send a paged query to the LDAP/AD server. But how do we know if there are more pages to follow and how do we specify with which control-value we have to send our next query?
This is where we're getting stuck... The server responds with a result set that includes the required paging information but PHP lacks a method to retrieve exactly this information from the result set. PHP provides a wrapper for the LDAP API function ldap_parse_result() but the required last parameter serverctrlsp is not exposed to the PHP function, so there is no way to retrieve the required information. A bug report has been filed for this issue but there has been no response since 2005. If the ldap_parse_result() function provided the required parameter, using paged results would work like
$l = ldap_connect('somehost.mydomain.com');
$pageSize = 100;
$pageControl = array(
'oid' => '1.2.840.113556.1.4.319',
'iscritical' => true,
'value' => sprintf ("%c%c%c%c%c%c%c", 48, 5, 2, 1, $pageSize, 4, 0)
);
$controls = array($pageControl);
ldap_set_option($l, LDAP_OPT_PROTOCOL_VERSION, 3);
ldap_bind($l, 'CN=bind-user,OU=my-users,DC=mydomain,DC=com', 'bind-user-password');
$continue = true;
while ($continue) {
ldap_set_option($l, LDAP_OPT_SERVER_CONTROLS, $controls);
$sr = ldap_search($l, 'OU=some-ou,DC=mydomain,DC=com', 'cn=*', array('sAMAccountName'), null, null, null, null);
ldap_parse_result ($l, $sr, $errcode, $matcheddn, $errmsg, $referrals, $serverctrls); // (*)
if (isset($serverctrls)) {
foreach ($serverctrls as $i) {
if ($i["oid"] == '1.2.840.113556.1.4.319') {
$i["value"]{8} = chr($pageSize);
$i["iscritical"] = true;
$controls = array($i);
break;
}
}
}
$info = ldap_get_entries($l, $sr);
if ($info["count"] < $pageSize) {
$continue = false;
}
for ($entry = ldap_first_entry($l, $sr); $entry != false; $entry = ldap_next_entry($l, $entry)) {
$dn = ldap_get_dn($l, $entry);
}
}
As you see there is a single line of code (*) that renders the whole thing useless. On my way though the sparse information on this subject I found a patch against the PHP 4.3.10 ext/ldap by Iñaki Arenaza but neither did I try it nor do I know if the patch can be applied on a PHP5 ext/ldap. The patch extends ldap_parse_result() to expose the 7th parameter to PHP:
--- ldap.c 2004-06-01 23:05:33.000000000 +0200
+++ /usr/src/php4/php4-4.3.10/ext/ldap/ldap.c 2005-09-03 17:02:03.000000000 +0200
## -74,7 +74,7 ##
ZEND_DECLARE_MODULE_GLOBALS(ldap)
static unsigned char third_argument_force_ref[] = { 3, BYREF_NONE, BYREF_NONE, BYREF_FORCE };
-static unsigned char arg3to6of6_force_ref[] = { 6, BYREF_NONE, BYREF_NONE, BYREF_FORCE, BYREF_FORCE, BYREF_FORCE, BYREF_FORCE };
+static unsigned char arg3to7of7_force_ref[] = { 7, BYREF_NONE, BYREF_NONE, BYREF_FORCE, BYREF_FORCE, BYREF_FORCE, BYREF_FORCE, BYREF_FORCE };
static int le_link, le_result, le_result_entry, le_ber_entry;
## -124,7 +124,7 ##
#if ( LDAP_API_VERSION > 2000 ) || HAVE_NSLDAP
PHP_FE(ldap_get_option, third_argument_force_ref)
PHP_FE(ldap_set_option, NULL)
- PHP_FE(ldap_parse_result, arg3to6of6_force_ref)
+ PHP_FE(ldap_parse_result, arg3to7of7_force_ref)
PHP_FE(ldap_first_reference, NULL)
PHP_FE(ldap_next_reference, NULL)
#ifdef HAVE_LDAP_PARSE_REFERENCE
## -1775,14 +1775,15 ##
Extract information from result */
PHP_FUNCTION(ldap_parse_result)
{
- pval **link, **result, **errcode, **matcheddn, **errmsg, **referrals;
+ pval **link, **result, **errcode, **matcheddn, **errmsg, **referrals, **serverctrls;
ldap_linkdata *ld;
LDAPMessage *ldap_result;
+ LDAPControl **lserverctrls, **ctrlp, *ctrl;
char **lreferrals, **refp;
char *lmatcheddn, *lerrmsg;
int rc, lerrcode, myargcount = ZEND_NUM_ARGS();
- if (myargcount 6 || zend_get_parameters_ex(myargcount, &link, &result, &errcode, &matcheddn, &errmsg, &referrals) == FAILURE) {
+ if (myargcount 7 || zend_get_parameters_ex(myargcount, &link, &result, &errcode, &matcheddn, &errmsg, &referrals, &serverctrls) == FAILURE) {
WRONG_PARAM_COUNT;
}
## -1793,7 +1794,7 ##
myargcount > 3 ? &lmatcheddn : NULL,
myargcount > 4 ? &lerrmsg : NULL,
myargcount > 5 ? &lreferrals : NULL,
- NULL /* &serverctrls */,
+ myargcount > 6 ? &lserverctrls : NULL,
0 );
if (rc != LDAP_SUCCESS ) {
php_error(E_WARNING, "%s(): Unable to parse result: %s", get_active_function_name(TSRMLS_C), ldap_err2string(rc));
## -1805,6 +1806,29 ##
/* Reverse -> fall through */
switch(myargcount) {
+ case 7 :
+ zval_dtor(*serverctrls);
+
+ if (lserverctrls != NULL) {
+ array_init(*serverctrls);
+ ctrlp = lserverctrls;
+
+ while (*ctrlp != NULL) {
+ zval *ctrl_array;
+
+ ctrl = *ctrlp;
+ MAKE_STD_ZVAL(ctrl_array);
+ array_init(ctrl_array);
+
+ add_assoc_string(ctrl_array, "oid", ctrl->ldctl_oid,1);
+ add_assoc_bool(ctrl_array, "iscritical", ctrl->ldctl_iscritical);
+ add_assoc_stringl(ctrl_array, "value", ctrl->ldctl_value.bv_val,
+ ctrl->ldctl_value.bv_len,1);
+ add_next_index_zval (*serverctrls, ctrl_array);
+ ctrlp++;
+ }
+ ldap_controls_free (lserverctrls);
+ }
case 6 :
zval_dtor(*referrals);
if (array_init(*referrals) == FAILURE) {
Actually the only option left would be to change the Active Directory configuration and raise the maximum result limit. The relevant option is called MaxPageSize and can be altered by using ntdsutil.exe - please see "How to view and set LDAP policy in Active Directory by using Ntdsutil.exe".
EDIT (reference to COM):
Or you can go the other way round and use the COM-approach via ADODB as suggested in the link provided by eykanal.

Support for paged results was added in PHP 5.4.
See ldap_control_paged_result for more details.

This isn't a full answer, but this guy was able to do it. I don't understand what he did, though.
By the way, a partial answer is that you CAN get "pages" of results. From the documentation:
resource ldap_search ( resource $link_identifier , string $base_dn ,
string $filter [, array $attributes [, int $attrsonly [, int $sizelimit [,
int $timelimit [, int $deref ]]]]] )
...
sizelimit Enables you to limit the count of entries fetched. Setting this to 0 means no limit.
Note: This parameter can NOT override server-side preset sizelimit.
You can set it lower though. Some directory server hosts will be
configured to return no more than a preset number of entries. If this
occurs, the server will indicate that it has only returned a partial
results set. This also occurs if you use this parameter to limit the
count of fetched entries.
I don't know how to specify that you want to search STARTING from a certain position, though. I.e., after you get your first 1000, I don't know how to specify that now you need the next 1000. Hopefully someone else can help you there :)

Here's an alternative (which works pre PHP 5.4). If you have 10,000 records you need to get but your AD server only returns 5,000 per page:
$ldapSearch = ldap_search($ldapResource, $basedn, $filter, array('member;range=0-4999'));
$ldapResults = ldap_get_entries($dn, $ldapSearch);
$members = $ldapResults[0]['member;range=0-4999'];
$ldapSearch = ldap_search($ldapResource, $basedn, $filter, array('member;range=5000-10000'));
$ldapResults = ldap_get_entries($dn, $ldapSearch);
$members = array_merge($members, $ldapResults[0]['member;range=5000-*']);

I was able to get around the size limitation using ldap_control_paged_result
ldap_control_paged_result is used to Enable LDAP pagination by sending the pagination control. The below function worked perfectly in my case.
function retrieves_users($conn)
{
$dn = 'ou=,dc=,dc=';
$filter = "(&(objectClass=user)(objectCategory=person)(sn=*))";
$justthese = array();
// enable pagination with a page size of 100.
$pageSize = 100;
$cookie = '';
do {
ldap_control_paged_result($conn, $pageSize, true, $cookie);
$result = ldap_search($conn, $dn, $filter, $justthese);
$entries = ldap_get_entries($conn, $result);
if(!empty($entries)){
for ($i = 0; $i < $entries["count"]; $i++) {
$data['usersLdap'][] = array(
'name' => $entries[$i]["cn"][0],
'username' => $entries[$i]["userprincipalname"][0]
);
}
}
ldap_control_paged_result_response($conn, $result, $cookie);
} while($cookie !== null && $cookie != '');
return $data;
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.