PHP ldap_search size limit exceeded - php

I'm quite new to querying Microsoft's Active Directory and encountering some difficulties:
The AD has a size limit of 1000 elements per request. I cannot change the size limit. PHP does not seem to support paging (I'm using version 5.2 and there's no way of updating the production server.)
I've so far encountered two possible solutions:
Sort the entries by objectSid and use filters to get all the objects. Sample Code
I don't like that for several reasons:
It seems unpredictable to mess with the objectSid, as you have to take it apart, convert it to decimal, convert it back ...
I don't see how you can compare these id's.
(I've tried: '&((objectClass=user)(objectSid>=0))')
Filter after the first letters of the object names (as suggested here):
That's not an optimal solution as many of the users/groups in our system are prefixed with the same few letters.
So my question:
What approach is best used here?
If it's the first one, how can I be sure to handle the objectSid correctly?
Any other possibilities?
Am I missing something obvious?
Update:
- This related question provides information about why the Simple Paged Results extension does not work.
- The web server is running on a Linux server, so COM objects/adoDB are not an option.

I was able to get around the size limitation using ldap_control_paged_result
ldap_control_paged_result is used to Enable LDAP pagination by sending the pagination control. The below function worked perfectly in my case. This would work for (PHP 5 >= 5.4.0, PHP 7)
function retrieves_users($conn)
{
$dn = 'ou=,dc=,dc=';
$filter = "(&(objectClass=user)(objectCategory=person)(sn=*))";
$justthese = array();
// enable pagination with a page size of 100.
$pageSize = 100;
$cookie = '';
do {
ldap_control_paged_result($conn, $pageSize, true, $cookie);
$result = ldap_search($conn, $dn, $filter, $justthese);
$entries = ldap_get_entries($conn, $result);
if(!empty($entries)){
for ($i = 0; $i < $entries["count"]; $i++) {
$data['usersLdap'][] = array(
'name' => $entries[$i]["cn"][0],
'username' => $entries[$i]["userprincipalname"][0]
);
}
}
ldap_control_paged_result_response($conn, $result, $cookie);
} while($cookie !== null && $cookie != '');
return $data;
}
If you have successfully updated your server by now, then the function above can get all the entries. I am using this function to get all users in our AD.

As I've not found any clean solutions I decided to go with the first approach: Filtering By Object-Sids.
This workaround has it's limitations:
It only works for objects with an objectsid, i.e Users and Groups.
It assumes that all Users/Groups are created by the same authority.
It assumes that there are not more missing relative SIDs than the size limit.
The idea is it to first read all possible objects and pick out the one with the lowest relative SID. The relative SID is the last chunk in the SID:
S-1-5-21-3188256696-111411151-3922474875-1158
Let's assume this is the lowest relative SID in a search that only returned 'Partial Search Results'.
Let's further assume the size limit is 1000.
The program then does the following:
It searches all Objects with the SIDs between
S-1-5-21-3188256696-111411151-3922474875-1158
and
S-1-5-21-3188256696-111411151-3922474875-0159
then all between
S-1-5-21-3188256696-111411151-3922474875-1158
and
S-1-5-21-3188256696-111411151-3922474875-2157
and so on until one of the searches returns zero objects.
There are several problems with this approach, but it's sufficient for my purposes.
The Code:
$filter = '(objectClass=Group)';
$attributes = array('objectsid','cn'); //objectsid needs to be set
$result = array();
$maxPageSize = 1000;
$searchStep = $maxPageSize-1;
$adResult = #$adConn->search($filter,$attributes); //Supress warning for first query (because it exceeds the size limit)
//Read smallest RID from the resultset
$minGroupRID = '';
for($i=0;$i<$adResult['count'];$i++){
$groupRID = unpack('V',substr($adResult[$i]['objectsid'][0],24));
if($minGroupRID == '' || $minGroupRID>$groupRID[1]){
$minGroupRID = $groupRID[1];
}
}
$sidPrefix = substr($adResult[$i-1]['objectsid'][0],0,24); //Read last objectsid and cut off the prefix
$nextStepGroupRID = $minGroupRID;
do{ //Search for all objects with a lower objectsid than minGroupRID
$adResult = $adConn->search('(&'.$filter.'(objectsid<='.preg_replace('/../','\\\\$0',bin2hex($sidPrefix.pack('V',$nextStepGroupRID))).')(objectsid>='.preg_replace('/../','\\\\$0',bin2hex($sidPrefix.pack('V',$nextStepGroupRID-$searchStep))).'))', $attributes);
for($i=0;$i<$adResult['count'];$i++){
$RID = unpack('V',substr($adResult[$i]['objectsid'][0],24)); //Extract the relative SID from the SID
$RIDs[] = $RID[1];
$resultSet = array();
foreach($attributes as $attribute){
$resultSet[$attribute] = $adResult[$i][$attribute][0];
}
$result[$RID[1]] = $resultSet;
}
$nextStepGroupRID = $nextStepGroupRID-$searchStep;
}while($adResult['count']>1);
$nextStepGroupRID = $minGroupRID;
do{ //Search for all object with a higher objectsid than minGroupRID
$adResult = $adConn->search('(&'.$filter.'(objectsid>='.preg_replace('/../','\\\\$0',bin2hex($sidPrefix.pack('V',$nextStepGroupRID))).')(objectsid<='.preg_replace('/../','\\\\$0',bin2hex($sidPrefix.pack('V',$nextStepGroupRID+$searchStep))).'))', $attributes);
for($i=0;$i<$adResult['count'];$i++){
$RID = unpack('V',substr($adResult[$i]['objectsid'][0],24)); //Extract the relative SID from the SID
$RIDs[] = $RID[1];
$resultSet = array();
foreach($attributes as $attribute){
$resultSet[$attribute] = $adResult[$i][$attribute][0];
}
$result[$RID[1]] = $resultSet;
}
$nextStepGroupRID = $nextStepGroupRID+$searchStep;
}while($adResult['count']>1);
var_dump($result);
The $adConn->search method looks like this:
function search($filter, $attributes = false, $base_dn = null) {
if(!isset($base_dn)){
$base_dn = $this->baseDN;
}
$entries = false;
if (is_string($filter) && $this->bind) {
if (is_array($attributes)) {
$search = ldap_search($this->resource, $base_dn, $filter, $attributes);
} else {
$search = ldap_search($this->resource, $base_dn, $filter);
}
if ($search !== false) {
$entries = ldap_get_entries($this->resource, $search);
}
}
return $entries;
}

Never make assumptions about servers or server configuration, this leads to brittle code and unexpected, sometimes spectacular failures. Just because it is AD today does not mean it will be tomorrow, or that Microsoft will not change the default limit in the server. I recently dealt with a situation where client code was written with the tribal knowledge that the size limit was 2000, and when administrators, for reasons of their own, changed the size limit, the client code failed horribly.
Are you sure that PHP does not support request controls (the simple paged result extension is a request control)? I wrote an article about "LDAP: Simple Paged Results", and though the article sample code is Java, the concepts are important, not the language. See also "LDAP: Programming Practices".

The previous script error may occur when the distance between the nearest SIDs 999 more.
Example:
S-1-5-21-3188256696-111411151-3922474875-1158
S-1-5-21-3188256696-111411151-3922474875-3359
3359-1158 > 999
in order to avoid this you need to use rigid structures
Example:
$tt = '1';
do {
...
$nextStepGroupRID = $nextStepGroupRID - $searchStep;
$tt++;
} while ($tt < '30');
In this example, we are forced to check 999 * 30 * 2 = 59940 values.

Related

How to manage PHP memory?

I wrote a one-off script that I use to parse PDFs saved on the database. So far it is working okay until I ran out of memory after parsing 2,700+ documents.
The basic flow of the script is as follows:
Get a list of all the document IDs to be parsed and save it as an array in the session (~155k documents).
Display a page that has a button to start parsing
Make an AJAX request when that button is clicked that would parse the first 50 documents in the session array
$files = $_SESSION['files'];
$ids = array();
$slice = array_slice($files, 0, 50);
$files = array_slice($files, 50, null); // remove the 50 we are parsing on this request
if(session_status() == PHP_SESSION_NONE) {
session_start();
}
$_SESSION['files'] = $files;
session_write_close();
for($i = 0; $i < count($slice); $i++) {
$ids[] = ":id_{$i}";
}
$ids = implode(", ", $ids);
$sql = "SELECT d.id, d.filename, d.doc_content
FROM proj_docs d
WHERE d.id IN ({$ids})";
$stmt = oci_parse($objConn, $sql);
for($i = 0; $i < count($slice); $i++) {
oci_bind_by_name($stmt, ":id_{$i}", $slice[$i]);
}
oci_execute($stmt, OCI_DEFAULT);
$cnt = oci_fetch_all($stmt, $data);
oci_free_statement($stmt);
# Do the parsing..
# Output a table row..
The response to the AJAX request typically includes a status whether the script has finished parsing the total ~155k documents - if it's not done, another AJAX request is made to parse the next 50. There's a 5 second delay between each request.
Questions
Why am I running out of memory when I was expecting that peak memory usage would be when I get a list of all the document IDs on #1 since it holds all of the possible documents not a few minutes later when the session array holds 2,700 elements less?
I saw a few questions similar to my problem and they suggested to either set the memory to unlimited which I don't want to do at all. The others suggested to set my variables to null when appropriate and I did that but I still ran out of memory after parsing ~2,700 documents. So what other approaches should I try?
# Freeing some memory space
$batch_size = null;
$with_xfa = null;
$non_xfa = null;
$total = null;
$files = null;
$ids = null;
$slice = null;
$sql = null;
$stmt = null;
$objConn = null;
$i = null;
$data = null;
$cnt = null;
$display_class = null;
$display = null;
$even = null;
$tr_class = null;
So I'm not really sure why but reducing the number of documents I'm parsing from 50 down to 10 for each batch seems to fix the issue. I've gone past 5,000 documents now and the script is still running. My only guess is that when I was parsing 50 documents I must have encountered a lot of large files which used up all of the memory allotted.
Update #1
I got another error about memory running out at 8,500+ documents. I've reduced the batches further down to 5 documents each and will see tomorrow if it goes all the way to parsing everything. If that fails, I'll just increase the memory allocated temporarily.
Update #2
So it turns out that the only reason why I'm running out of memory is that we apparently have multiple PDF files that are over 300MB uploaded on to the database. I increased the memory allotted to PHP to 512MB and this seems to have allowed me to finish parsing everything.

get all objects from parse class using php

I'm using this PHP code to get objects from class.
I've got 100000 objects.
I want to get all the objects in a single query.
I'm using the following code.
$query = new ParseQuery("news_master");
$results = $query->find();
You can remove limit to get all data, please try following code :-
$query = new ParseQuery("news_master");
$query->equalTo("All",true);
results = $query->find();
As of 2018 and Parse Community PHP SDK and server you can use the each function, which provides a callback and can iterate over all data. Note this cannot be used in conjunction with skip, sort, limit. See Docs
An example of a query from their TEST suite would look like this.
$query = new ParseQuery('Object');
$query->lessThanOrEqualTo('x', $count);
$values = [];
$query->each(
function ($obj) use (&$values) {
$values[] = $obj->get('x');
},
10
);
$valuesLength = count($values);
the value of 10 is what batch size you want. If your database table is locked down and requires master key then you can do the following.
$query = new ParseQuery('Object');
$query->lessThanOrEqualTo('x', $count);
$values = [];
$query->each(
function ($obj) use (&$values) {
$values[] = $obj->get('x');
},
true, 10 // notice the value of true
);
$valuesLength = count($values);
The reason I'm adding to this old comment is because if you search getting more than 1000 records from parse no good link comes up and this is usually the first one.
Cheers to anyone that stumbles across this!

Codeception multiple tests, 1 script

I think I might be getting the concept wrong or not thinking about something correctly. I'm looking for a way to connect to db, and then run a selenium test (in phantomjs) for every row of a table. The test is to check for broken images on a bespoke CMS, and could be applied to any CMS.
I basically want to run an acceptance test for every page (of a specific type) by loading their IDs from the db and then running a separate test for each ID.
This is what I have so far:
$I = new WebGuy($scenario);
$results = $I->getArrayFromDB('talkthrough', '`key`', array());
foreach ($results as $r) {
$I->wantTo('Check helpfile '.$r['key'].'for broken images');
$I->amOnPage('/talkThrough.php?id='.$r['key']);
$I->seeAllImages();
}
This works to some extent in that it executes until the first failure (because it is running as 1 test with many assertions).
How do I make this run as individual tests?
I ended up looping through and storing the key that failed in a comma delimited string and setting a bool to say failures found.
$I = new WebGuy($scenario);
$results = $I->getArrayFromDB('talkthrough', '`key`', array());
$failures = "Broken help files are: ";
$failures_found = false;
foreach ($results as $key => $r) {
$I->wantTo('Check helpfile '.$r['key'].'for broken images');
$I->amOnPage('/talkThrough.php?id='.$r['key']);
$allImagesFine = $I->checkAllImages();
if($allImagesFine != '1')
{
$fail = $r['key'].",";
$failures.= $fail;
$failures_found = true;
}
}
$I->seeBrokenImages($failures_found,$failures);
With following as my webhelper
<?php
namespace Codeception\Module;
// here you can define custom functions for WebGuy
class WebHelper extends \Codeception\Module
{
function checkAllImages()
{
$result = $this->getModule('Selenium2')->session->evaluateScript("return (function(){ return Array.prototype.slice.call(document.images).every(function (img) {return img.complete && img.naturalWidth > 0;}); })()");
return $result;
}
function getArrayFromDB($table, $column, $criteria = array())
{
$dbh = $this->getModule('Db');
$query = $dbh->driver->select($column, $table, $criteria);
$dbh->debugSection('Query', $query, json_encode($criteria));
$sth = $dbh->driver->getDbh()->prepare($query);
if (!$sth) \PHPUnit_Framework_Assert::fail("Query '$query' can't be executed.");
$sth->execute(array_values($criteria));
return $sth->fetchAll();
}
function seeBrokenImages($bool,$failArray)
{
$this->assertFalse($bool,$failArray);
}
}
Thanks for the submitted answers
That's not going to work. Please avoid loops and conditionals in your tests.
You should place the key manually. And not get them from database. As it introduces additional complexity.
It might not be the best design choice, but If you really want to follow this approach you could use the specify tool from codeception, in order to allow your test continue running even if one assertion fails:
https://github.com/Codeception/Specify

server error executing a large file

I have created a script which reads an XML file and adds it to the database. I am using XML Reader for this.
The problem is that my XML contains 500,000 products in it. This causes my page to time out. is there a way for me to achieve this?
My code below:
$z = new XMLReader;
$z->open('files/NAGardnersEBook.xml');
$doc = new DOMDocument;
# move to the first node
while ($z->read() && $z->name !== 'EBook');
# now that we're at the right depth, hop to the next <product/> until the end of the tree
while ($z->name === 'EBook')
{
$node = simplexml_import_dom($doc->importNode($z->expand(), true));
# Get the value of each node
$title = mysql_real_escape_string($node->Title);
$Subtitle = mysql_real_escape_string($node->SubTitle);
$ShortDescription = mysql_real_escape_string($node->ShortDescription);
$Publisher = mysql_real_escape_string($node->Publisher);
$Imprint = mysql_real_escape_string($node->Imprint);
# Get attributes
$isbn = $z->getAttribute('EAN');
$contributor = $node->Contributors;
$author = $contributor[0]->Contributor;
$author = mysql_real_escape_string($author);
$BicSubjects = $node->BicSubjects;
$Bic = $BicSubjects[0]->Bic;
$bicCode = $Bic[0]['Code'];
$formats = $node->Formats;
$type = $formats[0]->Format;
$price = $type[0]['Price'];
$ExclusiveRights = $type[0]['ExclusiveRights'];
$NotForSale = $type[0]['NotForSale'];
$arr[] = "UPDATE onix_d2c_data SET is_gardner='Yes', TitleText = '".$title."', Subtitle = '".$Subtitle."', PersonName='".$author."', ImprintName = '".$Imprint."', PublisherName = '".$Publisher."', Text = '".$ShortDescription."', BICMainSubject = '".$bicCode."', ExcludedTerritory='".$NotForSale."', RightsCountry='".$ExclusiveRights."', PriceAmount='".$price."', custom_category= 'Uncategorised', drm_type='adobe_drm' WHERE id='".$isbn."' ";
# go to next <product />
$z->next('EBook');
$isbns[] = $isbn;
}
foreach($isbns as $isbn){
$sql = "SELECT * FROM onix_d2c_data WHERE id='".$isbn."'";
$query = mysql_query($sql);
$count = mysql_num_rows($query);
if($count >0){
} else{
$sql = "INSERT INTO onix_d2c_data (id) VALUES ('".$isbn."')";
$query = mysql_query($sql);
}
}
foreach($arr as $sql){
mysql_query($sql);
}
Thank you,
Julian
You could use the function set_time_limit to extend the allowed script execution time or set max_execution_time in your php.ini.
You need to set these vaiables.Mare sure you have permission to change them
set_time_limit(0);
ini_set('max_execution_time', '6000');
You're executing two queries for each ISBN, just to check whether the ISBN already exists. Instead, set the ISBN column to unique (if it isn't already, it should be) then just go ahead and insert without checking. MySQL will return an error if it detects a duplicate which you can handle. This will reduce the number of queries and improve performance.
You're inserting each title with a separate call to the database. Instead, use the extended INSERT syntax to batch up many inserts in one query - see the MySQL manual for the ful syntax. Batching, say, 250 inserts will save a lot of time.
If you're not happy with batching inserts, use mysqli prepared statements which will reduce parsing time and and transmission time, so should improve your overall performance
You can probably trust Gardners list - consider dropping some of the escaping you're doing. I wouldn't recommend this for user input normally, but this is a special case.
Have you tried adding set_time_limit(0); on top of your PHP file ?
EDIT :
ini_set('memory_limit','16M');
Specify your limit there.
if you don't want to change the max_execution time as proposed by others, then you could also split up your tasks into several smaller tasks and let the server run a cron-job in several intervals.
E.g. 10.000 products each minute
Thank you all for such fast feedback. I managed to get the problem sorted by using array_chunks. Example below:
$thumbListLocal = array_chunk($isbns, 4, preserve_keys);
$thumbListLocalCount = count($thumbListLocal);
while ($i <= $thumbListLocalCount):
foreach($thumbListLocal[$i] as $index => $thumbName):
$sqlConstruct[] = "INSERT IGNORE INTO onix_d2c_data (id) VALUES ('".$thumbName."')";
endforeach;
foreach($sqlConstruct as $processSql){
mysql_query($processSql);
}
unset($thumbListLocal[$i]);
$i++;
endwhile;
I hope this helps someone.
Julian

Efficient way to look up value based on a key in php [duplicate]

This question already has answers here:
How to find array / dictionary value using key?
(2 answers)
Closed 1 year ago.
With a list of around 100,000 key/value pairs (both string, mostly around 5-20 characters each) I am looking for a way to efficiently find the value for a given key.
This needs to be done in a php website. I am familiar with hash tables in java (which is probally what I would do if working in java) but am new to php.
I am looking for tips on how I should store this list (in a text file or in a database?) and search this list.
The list would have to be updated occasionally but I am mostly interested in look up time.
You could do it as a straight PHP array, but Sqlite is going to be your best bet for speed and convenience if it is available.
PHP array
Just store everything in a php file like this:
<?php
return array(
'key1'=>'value1',
'key2'=>'value2',
// snip
'key100000'=>'value100000',
);
Then you can access it like this:
<?php
$s = microtime(true); // gets the start time for benchmarking
$data = require('data.php');
echo $data['key2'];
var_dump(microtime(true)-$s); // dumps the execution time
Not the most efficient thing in the world, but it's going to work. It takes 0.1 seconds on my machine.
Sqlite
PHP should come with sqlite enabled, which will work great for this kind of thing.
This script will create a database for you from start to finish with similar characteristics to the dataset you describe in the question:
<?php
// this will *create* data.sqlite if it does not exist. Make sure "/data"
// is writable and *not* publicly accessible.
// the ATTR_ERRMODE bit at the end is useful as it forces PDO to throw an
// exception when you make a mistake, rather than internally storing an
// error code and waiting for you to retrieve it.
$pdo = new PDO('sqlite:'.dirname(__FILE__).'/data/data.sqlite', null, null, array(PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION));
// create the table if you need to
$pdo->exec("CREATE TABLE stuff(id TEXT PRIMARY KEY, value TEXT)");
// insert the data
$stmt = $pdo->prepare('INSERT INTO stuff(id, value) VALUES(:id, :value)');
$id = null;
$value = null;
// this binds the variables by reference so you can re-use the prepared statement
$stmt->bindParam(':id', $id);
$stmt->bindParam(':value', $value);
// insert some data (in this case it's just dummy data)
for ($i=0; $i<100000; $i++) {
$id = $i;
$value = 'value'.$i;
$stmt->execute();
}
And then to use the values:
<?php
$s = microtime(true);
$pdo = new PDO('sqlite:'.dirname(__FILE__).'/data/data.sqlite', null, null, array(PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION));
$stmt = $pdo->prepare("SELECT * FROM stuff WHERE id=:id");
$stmt->bindValue(':id', 5);
$stmt->execute();
$value = $stmt->fetchColumn(1);
var_dump($value);
// the number of seconds it took to do the lookup
var_dump(microtime(true)-$s);
This one is waaaay faster. 0.0009 seconds on my machine.
MySQL
You could also use MySQL for this instead of Sqlite, but if it's just one table with the characteristics you describe, it's probably going to be overkill. The above Sqlite example will work fine using MySQL if you have a MySQL server available to you. Just change the line that instantiates PDO to this:
$pdo = new PDO('mysql:host=your.host;dbname=your_db', 'user', 'password', array(PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION));
The queries in the sqlite example should all work fine with MySQL, but please note that I haven't tested this.
Let's get a bit crazy: Filesystem madness
Not that the Sqlite solution is slow (0.0009 seconds!), but this about four times faster on my machine. Also, Sqlite may not be available, setting up MySQL might be out of the question, etc.
In this case, you can also use the file system:
<?php
$s = microtime(true); // more hack benchmarking
class FileCache
{
protected $basePath;
public function __construct($basePath)
{
$this->basePath = $basePath;
}
public function add($key, $value)
{
$path = $this->getPath($key);
file_put_contents($path, $value);
}
public function get($key)
{
$path = $this->getPath($key);
return file_get_contents($path);
}
public function getPath($key)
{
$split = 3;
$key = md5($key);
if (!is_writable($this->basePath)) {
throw new Exception("Base path '{$this->basePath}' was not writable");
}
$path = array();
for ($i=0; $i<$split; $i++) {
$path[] = $key[$i];
}
$dir = $this->basePath.'/'.implode('/', $path);
if (!file_exists($dir)) {
mkdir($dir, 0777, true);
}
return $dir.'/'.substr($key, $split);
}
}
$fc = new FileCache('/tmp/foo');
/*
// use this crap for generating a test example. it's slow to create though.
for ($i=0;$i<100000;$i++) {
$fc->add('key'.$i, 'value'.$i);
}
//*/
echo $fc->get('key1', 'value1');
var_dump(microtime(true)-$s);
This one takes 0.0002 seconds for a lookup on my machine. This also has the benefit of being reasonably constant regardless of the cache size.
It depends on how frequent you would access your array, think it this way how many users can access it at same time.There are many advantages towards storing it in database and here you have two options MySQL and SQLite.
SQLite works more like text file with SQL support, you can save few milliseconds during queries as it located within reach of your application, the main disadvantage of it that it can add only one record at a time (same as text file).
I would recommend SQLite for arrays with static content like GEO IP data, translations etc.
MySQL is more powerful solution but require authentication and located on separate machine.
PHP arrays will do everything you need. But shouldn't that much data be stored in a database?
http://php.net/array

Categories