What is wrong with this benchmarking script? - php

I've written a small php script to benchmark the performance of our various LAMP servers. I'm doing the test on speed of various factors such as disk I/O, database I/O, etc.
In the script, I'm first creating a random string of 100KB called $payload.
For Disk I/O check, I'm writing $payload to disk for 1000 times using file_put_contents() which completes in a few milliseconds.
Secondly, by using same logic for sqlite check, I'm inserting 1000 records of $payload string in an sqlite table. Shouldn't they take the same amount of time? But this sqlite inserts goes on for everrr. Any idea why?
for($i=0;$i<100000;$i++) //generate a big string
{
$n=rand(0,57)+65;
$payload = $payload.chr($n);
}
//write test:
$start = microtime(true);
if ($type=='disk') // Disk I/O -> This takes only a few msecs.
{
for($i=0;$i<1000;$i++) file_put_contents($fname,$payload);
}
else if ($type=='sqlite') //sqlite test -> This keeps running for everrrrrr.....
{
$db = new SQLite3("benchmark.db");
$db->exec('create table temp(t text)');
for($i=0;$i<1000;$i++) {
$db->exec("insert into temp values('{$payload}')");
};
}
$wtime=round((microtime(true) - $start)*1000);

When you are not using explicit transactions, SQLite will wrap an automatic transaction around every statement.
To guarantee that a transaction is durable, the database has to flush the data to disk at the end of each transaction.
This implies that it waits for the disk write to complete before continuing.
To make the database check similar to the disk check, execute $db->exec("pragma synchronous = off") after creating the DB.
However, you wouldn't want to use this setting in a real database where you'd care about data loss.
Wrap $db->exec("begin") and $db->exec("commit") around the loop to use a single transaction for all writes.

Related

PHP Cache Mechanism

I was working on a program which needs a cache system.
so the description is I got a mysql db which has 4 columns, 'mac','src','username','main' Which mac,src,username are key/values and foreign key in main table. it will insert to those 3 first and put their ID in main.
The data I got is about 18m for main table, and for those 3 about 2m each.
I dont want to use a select everytime it needs to insert in main, so I used an array to cache them.
$hash= ['mac'=>[],'src'=>[],'username'=>[]];
and store 'n fetch data like this : $hash['mac']['54:52:00:27:e4:91'];
This approach got bad performance when the hash data's goo beyond 500k ;
So is there any better way to do this ?
PS: I got same thing with nodeJS which i used a npm module named hashtable And Performance was about 10k inserts each 4m. I've read about php arrays and found out they are Hashtables , but now it do the same job with far wayslower, for only 1k it takes atleast 5minutes;
Assuming you're on a Linux server. See: Creating a RAM disk. Once you have a RAM disk, cache each ID as a file, using a sha1() hash of the mac address. The RAM disk file is, well, RAM; i.e., a persistent cache in memory.
<?php
$mac = '54:52:00:27:e4:91';
$cache = '/path/to/ramdisk/'.sha1($mac);
if (is_file($cache)) { // Cached already?
$ID = file_get_contents($cache); // From the cache.
} else {
// Run SQL query here and get the $ID.
// Now cache the $ID.
file_put_contents($cache, $ID); // Cache it.
}
// Now do your insert here.
To clarify: A RAM disk allows you to use filesystem wrappers in PHP, such as file_get_contents() and file_put_contents() to read/write to RAM.
Other more robust alternatives to consider:
Redis: https://www.tutorialspoint.com/redis/redis_php.htm
Memcached: http://php.net/manual/en/book.memcached.php
You can use PHP Super Cache which is very simple which is faster than Reddis, Memcache etc
require __DIR__.'/vendor/autoload.php';
use SuperCache\SuperCache as sCache;
//Saving cache value with a key
// sCache::cache('<key>')->set('<value>');
sCache::cache('myKey')->set('Key_value');
//Retrieving cache value with a key
echo sCache::cache('myKey')->get();
https://packagist.org/packages/smart-php/super-cache

PHP Predis: How to convert `redis-cli script load $LUA_SCRIPT` into Predis methods?

How to convert redis-cli script load $LUA_SCRIPT into Predis methods?
Follow is the lua script:
local lock_key = 'icicle-generator-lock'
local sequence_key = 'icicle-generator-sequence'
local logical_shard_id_key = 'icicle-generator-logical-shard-id'
local max_sequence = tonumber(KEYS[1])
local min_logical_shard_id = tonumber(KEYS[2])
local max_logical_shard_id = tonumber(KEYS[3])
local num_ids = tonumber(KEYS[4])
if redis.call('EXISTS', lock_key) == 1 then
redis.log(redis.LOG_INFO, 'Icicle: Cannot generate ID, waiting for lock to expire.')
return redis.error_reply('Icicle: Cannot generate ID, waiting for lock to expire.')
end
--[[
Increment by a set number, this can
--]]
local end_sequence = redis.call('INCRBY', sequence_key, num_ids)
local start_sequence = end_sequence - num_ids + 1
local logical_shard_id = tonumber(redis.call('GET', logical_shard_id_key)) or -1
if end_sequence >= max_sequence then
--[[
As the sequence is about to roll around, we can't generate another ID until we're sure we're not in the same
millisecond since we last rolled. This is because we may have already generated an ID with the same time and
sequence, and we cannot allow even the smallest possibility of duplicates. It's also because if we roll the sequence
around, we will start generating IDs with smaller values than the ones previously in this millisecond - that would
break our k-ordering guarantees!
The only way we can handle this is to block for a millisecond, as we can't store the time due the purity constraints
of Redis Lua scripts.
In addition to a neat side-effect of handling leap seconds (where milliseconds will last a little bit longer to bring
time back to where it should be) because Redis uses system time internally to expire keys, this prevents any duplicate
IDs from being generated if the rate of generation is greater than the maximum sequence per millisecond.
Note that it only blocks even it rolled around *not* in the same millisecond; this is because unless we do this, the
IDs won't remain ordered.
--]]
redis.log(redis.LOG_INFO, 'Icicle: Rolling sequence back to the start, locking for 1ms.')
redis.call('SET', sequence_key, '-1')
redis.call('PSETEX', lock_key, 1, 'lock')
end_sequence = max_sequence
end
--[[
The TIME command MUST be called after anything that mutates state, or the Redis server will error the script out.
This is to ensure the script is "pure" in the sense that randomness or time based input will not change the
outcome of the writes.
See the "Scripts as pure functions" section at http://redis.io/commands/eval for more information.
--]]
local time = redis.call('TIME')
return {
start_sequence,
end_sequence, -- Doesn't need conversion, the result of INCR or the variable set is always a number.
logical_shard_id,
tonumber(time[1]),
tonumber(time[2])
}
redis-cli script load $LUA_SCRIPT
I tried
$predis->eval(file_get_contents($luaPath), 0);
or
class ListPushRandomValue extends \Predis\Command\ScriptCommand {
public function getKeysCount() {
return 0;
}
public function getScript() {
$luaPath = Aa::$vendorRoot .'/icicle/id-generation.lua';
return file_get_contents($luaPath);
}
}
$predis->getProfile()->defineCommand('t', '\Controller\ListPushRandomValue');
$response = $predis->t();
But both of above showed error below.
ERR Error running script (call to f_5849d008682280eed4ec67b97ba50ae546fc5e8d): #user_script:19: user_script:19: attempt to perform arithmetic on local 'end_sequence' (a table value)
First, let me say that I am not an expert on LUA but I have just dealt with Redis and LUA script to implement simple locking and noticed a few errors in the question.
There is a conversion process between Redis and LUA that should be reviewed: Conversion. I think this will help with the error given.
On the this call:
$predis->eval(file_get_contents($luaPath), 0);
You pass the contents of the script and access keys but don't pass in any keys to evaluate. This call is more correct:
$predis->eval(file_get_contents($luaPath), 4, oneKey, twoKey, threeKey, fourKey);
This might actually be the reason for the above error. Hope this helps someone in the future.

Long PHP script runs multiple times

I have a products database that synchronizes with product data ever morning.
The process is very clear:
Get all products from database by query
Loop through all products, and get and xml from the other server by product_id
Update data from xml
Log the changes to file.
If I query a low amount of items, but limiting it to 500 random products for example, everything goes fine. But when I query all products, my script SOMETIMES goes on the fritz and starts looping multiple times. Hours later I still see my log file growing and products being added.
I checked everything I could think of, for example:
Are variables not used twice without overwriting each other
Does the function call itself
Does it happen with a low amount of products too: no.
The script is called using a cronjob, are the settings ok. (Yes)
The reason that makes it especially weird is that it sometimes goes right, and sometimes it doesnt. Could this be some memory problem?
EDIT
wget -q -O /dev/null http://example.eu/xxxxx/cron.php?operation=sync its in webmin called on a specific hour and minute
Code is hundreds of lines long...
Thanks
You have:
max_execution_time disabled. Your script won't end until the process is complete for as long as it needed.
memory_limit disabled. There is no limit to how much data stored in memory.
500 records were completed without issues. This indicates that the scripts completes its process before the next cronjob iteration. For example, if your cron runs every hour, then the 500 records are processed in less than an hour.
If you have a cronjob that is going to process large amount of records, then consider adding lock mechanism to the process. Only allow the script to run once, and start again when the previous process is complete.
You can create script lock as part of a shell script before executing your php script. Or, if you don't have an access to your server you can use database lock within the php script, something like this.
class ProductCronJob
{
protected $lockValue;
public function run()
{
// Obtain a lock
if ($this->obtainLock()) {
// Run your script if you have valid lock
$this->syncProducts();
// Release the lock on complete
$this->releaseLock();
}
}
protected function syncProducts()
{
// your long running script
}
protected function obtainLock()
{
$time = new \DateTime;
$timestamp = $time->getTimestamp();
$this->lockValue = $timestamp . '_syncProducts';
$db = JFactory::getDbo();
$lock = [
'lock' => $this->lockValue,
'timemodified' => $timestamp
];
// lock = '0' indicate that the cronjob is not active.
// Update #__cronlock set lock = '', timemodified = '' where name = 'syncProducts' and lock = '0'
// $result = $db->updateObject('#__cronlock', $lock, 'id');
// $lock = SELECT * FROM #__cronlock where name = 'syncProducts';
if ($lock !== false && (string)$lock !== (string)$this->lockValue) {
// Currently there is an active process - can't start a new one
return false;
// You can return false as above or add extra logic as below
// Check the current lock age - how long its been running for
// $diff = $timestamp - $lock['timemodified'];
// if ($diff >= 25200) {
// // The current script is active for 7 hours.
// // You can change 25200 to any number of seconds you want.
// // Here you can send notification email to site administrator.
// // ...
// }
}
return true;
}
protected function releaseLock()
{
// Update #__cronlock set lock = '0' where name = 'syncProducts'
}
}
Your script is running for quite some time (~45m) and wget think it's "timing out" since you don't return any data. By default wget will have a 900s timeout value and a retry count of 20. So first you should probably change your wget command to prevent this:
wget --tries=0 --timeout=0 -q -O /dev/null http://example.eu/xxxxx/cron.php?operation=sync
Now removing the timeout could lead to other issue, so instead you could send (and flush to force webserver to send it) data from your script to make sure wget doesn't think the script "timed out", something every 1000 loops or something like that. Think of this as a progress bar...
Just keep in mind that you will hit an issue when the run time will get close to your period as 2 crons will run in parallel. You should optimize your process and/or have a lock mechanism maybe?
I see two possibilities:
- chron calls the script much more often
- script takes too long somehow.
you can try estimate the time a single iteration of the loop takes.
this can be done with time(). perhaps the result is suprising, perhaps not. you can probably get the number of results too. multiply the two, that way you will have an estimate of how long the process should take.
$productsToSync = $db->loadObjectList();
and
foreach ($productsToSync AS $product) {
it seems you load every result into an array. this wont work for huge databases because obviously a million rows wont fit in memory. you should just get one result at a time. with mysql there are methods that just fetch one thing at a time from the resource, i hope yours allows the same.
I also see you execute another query each iteration of the loop. this is something I try to avoid. perhaps you can move this to after the first query has ended and do all of those in one big query? otoh this may bite my first suggestion.
also if something goes wrong, try to be paranoid when debugging. measure as much as you can. time as much as you can when its a performance issue. put the timings in you log file. usually you will find the bottleneck.
I solved the problem myself. Thanks for all the replies!
My MySQL timed out, that was the problem. As soon as I added:
ini_set('mysql.connect_timeout', 14400);
ini_set('default_socket_timeout', 14400);
to my script the problem stopped. I really hope this helps someone. Ill upvote all the locking answers, because those were very helpful!

Storing data in /tmp from a forked process in php

For awhile now, I've been storing serialized objects from forked processes in /tmp with file_put_contents.
Once all child processes wrap up, I'm simply using file_get_contents and unserializing the data to rebuild my object for processing.
so my question is, is there a better way of storing my data without writing to /tmp?
Outside of storing the data in a file, the only other native solutions that come to mind is shm http://www.php.net/manual/en/function.shm-attach.php or socket stream pairs http://www.php.net/manual/en/function.stream-socket-pair.php
Either of these should be doable if the data collected is unimportant after the script is run. The idea behind both of them is to just open a communication channel between your parent and child processes. I will say that my personal opinion is that unless there is some sort of issue using the file system is causing that it is by far the least complicated solution.
SHM
The idea with shm is that instead of storing the serialized objects in a file, you would store them in an shm segment protected for concurrency by a semaphore. Forgive the code, it is rough but should be enough to give you the general idea.
/*** Configurations ***/
$blockSize = 1024; // Size of block in bytes
$shmVarKey = 1; //An integer specifying the var key in the shm segment
/*** In the children processes ***/
//First you need to get a semaphore, this is important to help make sure you don't
//have multiple child processes accessing the shm segment at the same time.
$sem = sem_get(ftok(tempnam('/tmp', 'SEM'), 'a'));
//Then you need your shm segment
$shm = shm_attach(ftok(tempnam('/tmp', 'SHM'), 'a'), $blockSize);
if (!$sem || !$shm) {
//error handling goes here
}
//if multiple forks hit this line at roughly the first time, the first one gets the lock
//everyone else waits until the lock is released before trying again.
sem_acquire($sem);
$data = shm_has_var($shm, $shmVarKey) ? shm_get_var($shm, $shmVarKey) : shm_get_var($shm, $shmVarKey);
//Here you could key the data array by probably whatever you are currently using to determine file names.
$data['child specific id'] = 'my data'; // can be an object, array, anything that is php serializable, though resources are wonky
shm_put_var($shm, $shmVarKey, $data); // important to note that php handles the serialization for you
sem_release($sem);
/*** In the parent process ***/
$shm = shm_attach(ftok(tempnam('/tmp', 'SHM'), 'a'), $blockSize);
$data = shm_get_var($shm, $shmVarKey);
foreach ($data as $key => $value)
{
//process your data
}
Stream Socket Pair
I personally love using these for inter process communication. The idea is that prior to forking, you create a stream socket pair. This results in two read write sockets being created that are connected to each other. One of them should be used by the parent, one of them should be used by the child. You would have to create a separate pair for each child and it will change your parent's model a little bit in that it will need to manage the communication a bit more real time.
Fortunately the PHP docs for this function has a great example: http://us2.php.net/manual/en/function.stream-socket-pair.php
You could use a shared memory cache such as memcached which would be faster, but depending on what you're doing and how sensitive/important the data is, a file-based solution may be your best option.

Multiple "agents" handling a single array

Apologies if this has been covered before - I did my searching but possibly may not know the correct terms to have used.
This process is handled with PHP.
Here's the situation:
I have a large array of file names. The script I have opens these files and enters their content into a database. Processing these files one at a time takes over 24 hours, and these files are updated on a daily basis.
Breaking the single large array into four smaller arrays and running concurrent processes finishes the job before the 24 hour window elapses, but sometimes one or two processes will finish hours before the others because file sizes vary on a daily basis.
Much like people who stock retail shelves (who else has worked that nightmare before?) pitch in to help out with what's left after finishing their own tasks, I'd like to have a script in place where these "agents" do the same.
Here's some basics of what I have figured out - it could be wrong, and I'm not too proud to protest if I am :-)
$files = array('file1','file2','file3','file4','file5');
//etc... on to over 4k elements
while($file = array_pop($files)){
//Something in here... I have no idea what.
}
Ideas? Something like four function calls or four loops within that overarching 'while' has crossed my mind, but I'm pretty sure it's going to wait on executing subsequent calls until the previous one(s) finish.
Any help is appreciated. I'm seriously stuck on this one!
Thanks!
A database-backed message queue seems the obvious solution but I think that's overkill in this case. I would simply put the files to be processed into a single dedicated queue directory, then use the DirectoryIterator class to scan it. Something like this:
while (true) {
look in the queue directory for a file
if you don't fine one, exit the script, all processing is done
if you find one, rename it or move it to a work directory
if the rename/move command succeeded, process the file
if the rename/move command failed, one of the other threads got it first
}
Edit:
Regarding launching the workers, you could use a simple shell script to spawn the PHP processes in the background:
NUM_WORKERS=5
for WORKER in $(seq 1 ${NUM_WORKERS})
do
echo "starting worker ${WORKER}"
php -f /path/to/my/process.php &
done
Then, create a cron entry to run this launcher, for example, at midnight:
0 0 * * * /path/to/launcher.sh
You want what's called a "message queue". Something like beanstalkd
You'll basically create a list of messages that include your individual filenames. You'll then create a set of processors to process them. Each processor will handle one file then go back to the queue to see if there are more messages/files waiting to be processed.
EDIT:
Here's an analogy to help explain message queues. Your first idea is like a human manager taking a stack of files, dividing them into four piles and then handing each of his four employees a pile to process. A message queue is more like this: the manager puts all the files on a table and tells each employee to take a single file from the table and process it. He tells them when they're done with the first file to keep taking files until there are no more files on the table. When all the files are done, the employees can go home.
One employee might end up with really large files and only handle a few, while another employee might get smaller files and handle many. It doesn't matter how many each employee handles, they'll all keep working until the table is empty.
I would have a socket server master script that hands out file paths to x number of slave scripts, until there are no files left to process. This way, all the slave scripts will keep running, and you can hand out file paths dynamically as they are requested.
Something like this:
master.php
<?php
// load the array of files to process (however you do this)
$fileList = file('filelist.txt');
// Create a listening socket on localhost
$serverSocket = stream_socket_server('tcp://127.0.0.1:7878');
$sockets = array($serverSocket);
$clients = array();
// Loop while there are still files to process
while (count($fileList)) {
// Run a select() call on the existing sockets' read buffers
// Skip to next iteration if no sockets are waiting for handling
if (stream_select($read = $sockets, $write = NULL, $except = NULL, 1) < 1) {
continue;
}
// Loop sockets with data to read
foreach ($read as $socket) {
if ($socket == $serverSocket) {
// Accept new clients
$sockets[] = $clients[] = stream_socket_accept($serverSocket);
} else if (trim(fgets($socket)) == 'next') {
// Hand out a new file path to the client
fwrite($socket, array_shift($fileList)."\n");
if (!count($fileList)) {
break 2;
}
}
}
}
// When we're done, disconnect the clients
foreach ($clients as $socket) {
#fclose($socket);
}
// ...and close the listen socket
#fclose($serverSocket);
slave.php
<?php
$socket = fsockopen('127.0.0.1', 7878);
while (!feof($socket)) {
// Get a new file path from the master
fwrite($socket,"next\n");
$path = trim(fgets($socket));
if (is_file($path)) {
// Process the file at $path here
}
}
You then just need to start master.php, then when it is running, you can start however many instances of slave.php as you want, and they will all keep running until there are no more files to process.
Obviously, this has no error handling, but it should provide a basic framework to get you started. This relies on blocking function calls (stream_select() and fgets()) to avoid a race condition - this may or may not be sufficient for your purposes.

Categories