We are planning to building real time bidding and we are evaluating performance of PHP compare to Java in terms of throughput/response times etc.
(Java part is taken care by other member of team)
Initial start:
I have a test script which makes 50 http connection to different servers.
1st approach
- I am using curl_multi_init function and I get response under 7 seconds.
2nd approach
- I am using PHP pthreads api and trying to make parallel calls and expecting same response time or less.But total time on average is around 25 seconds
Here is the code
<?php
$g_request_arr = array(
'0' => array(
'request_url' => 'https://www.google.co.uk/?#q=56%2B12'
),
..
..
..
'49'=>array(
'request_url' => 'https://www.google.co.uk/?#q=256%2B132'
)
);
class ChildThread extends Thread {
public function __construct($urls) {
$this->data = $urls;
}
public function run(){
foreach($this->data as $url_info ){
$url = $url_info['request_url'];
file_get_contents($url);
}
$this->synchronized(function($thread){
$thread->notify();
}, $this);
}
}
$thread = new ChildThread($g_request_arr);
$thread->start();
$thread->synchronized(function($thread){
}, $thread);
?>
I want to know what is missing in above code or is it possible to bring the response under 7 seconds.
You are requesting all the data in one thread, here's a better approach:
<?php
class WebRequest extends Stackable {
public $request_url;
public $response_body;
public function __construct($request_url) {
$this->request_url = $request_url;
}
public function run(){
$this->response_body = file_get_contents(
$this->request_url);
}
}
class WebWorker extends Worker {
public function run(){}
}
$list = array(
new WebRequest("http://google.com"),
new WebRequest("http://www.php.net")
);
$max = 8;
$threads = array();
$start = microtime(true);
/* start some workers */
while (#$thread++<$max) {
$threads[$thread] = new WebWorker();
$threads[$thread]->start();
}
/* stack the jobs onto workers */
foreach ($list as $job) {
$threads[array_rand($threads)]->stack(
$job);
}
/* wait for completion */
foreach ($threads as $thread) {
$thread->shutdown();
}
$time = microtime(true) - $start;
/* tell you all about it */
printf("Fetched %d responses in %.3f seconds\n", count($list), $time);
$length = 0;
foreach ($list as $listed) {
$length += strlen($listed["response_body"]);
}
printf("Total of %d bytes\n", $length);
?>
This uses multiple workers, which you can adjust by changing $max. There's not much point in creating 1000 threads if you have 1000 requests to process.
Related
I use a variety of 3rd party web APIs, and many of them enforce rate limiting. It would be very useful to have a fairly generic PHP library that I could rate limit my calls with. I can think of a few ways to do it, perhaps by putting calls into a queue with a timestamp of when the call can be made, but I was hoping to avoid reinventing the wheel if someone else has already done this well.
You can do rate limiting with the token bucket algorithm. I implemented that for you in PHP: bandwidth-throttle/token-bucket
:
use bandwidthThrottle\tokenBucket\Rate;
use bandwidthThrottle\tokenBucket\TokenBucket;
use bandwidthThrottle\tokenBucket\storage\FileStorage;
$storage = new FileStorage(__DIR__ . "/api.bucket");
$rate = new Rate(10, Rate::SECOND);
$bucket = new TokenBucket(10, $rate, $storage);
$bucket->bootstrap(10);
if (!$bucket->consume(1, $seconds)) {
http_response_code(429);
header(sprintf("Retry-After: %d", floor($seconds)));
exit();
}
I realize this is an old thread but thought I'd post my solution since it was based on something else I found on SE. I looked for a while for an answer myself but had trouble finding something good. It's based on the Python solution discussed here, but I've added support for variable-sized requests and turned it into a function generator using PHP closures.
function ratelimiter($rate = 5, $per = 8) {
$last_check = microtime(True);
$allowance = $rate;
return function ($consumed = 1) use (
&$last_check,
&$allowance,
$rate,
$per
) {
$current = microtime(True);
$time_passed = $current - $last_check;
$last_check = $current;
$allowance += $time_passed * ($rate / $per);
if ($allowance > $rate)
$allowance = $rate;
if ($allowance < $consumed) {
$duration = ($consumed - $allowance) * ($per / $rate);
$last_check += $duration;
usleep($duration * 1000000);
$allowance = 0;
}
else
$allowance -= $consumed;
return;
};
}
It can be used to limit just about anything. Here's a stupid example that limits a simple statement at the default five "requests" per eight seconds:
$ratelimit = ratelimiter();
while (True) {
$ratelimit();
echo "foo".PHP_EOL;
}
Here's how I'm using it to limit batched requests against the Facebook Graph API at 600 requests per 600 seconds based on the size of the batch:
$ratelimit = ratelimiter(600, 600);
while (..) {
..
$ratelimit(count($requests));
$response = (new FacebookRequest(
$session, 'POST', '/', ['batch' => json_encode($requests)]
))->execute();
foreach ($response->..) {
..
}
}
Hope this helps someone!
This is essentially the same as #Jeff's answer, but I have tidied the code up a lot and added PHP7.4 type/return hinting.
I have also published this as a composer package: https://github.com/MacroMan/rate-limiter
composer require macroman/rate-limiter
/**
* Class RateLimiter
*
* #package App\Components
*/
class Limiter
{
/**
* Limit to this many requests
*
* #var int
*/
private int $frequency = 0;
/**
* Limit for this duration
*
* #var int
*/
private int $duration = 0;
/**
* Current instances
*
* #var array
*/
private array $instances = [];
/**
* RateLimiter constructor.
*
* #param int $frequency
* #param int $duration #
*/
public function __construct(int $frequency, int $duration)
{
$this->frequency = $frequency;
$this->duration = $duration;
}
/**
* Sleep if the bucket is full
*/
public function await(): void
{
$this->purge();
$this->instances[] = microtime(true);
if (!$this->is_free()) {
$wait_duration = $this->duration_until_free();
usleep($wait_duration);
}
}
/**
* Remove expired instances
*/
private function purge(): void
{
$cutoff = microtime(true) - $this->duration;
$this->instances = array_filter($this->instances, function ($a) use ($cutoff) {
return $a >= $cutoff;
});
}
/**
* Can we run now?
*
* #return bool
*/
private function is_free(): bool
{
return count($this->instances) < $this->frequency;
}
/**
* Get the number of microseconds until we can run the next instance
*
* #return float
*/
private function duration_until_free(): float
{
$oldest = $this->instances[0];
$free_at = $oldest + $this->duration * 1000000;
$now = microtime(true);
return ($free_at < $now) ? 0 : $free_at - $now;
}
}
Usage is the same
use RateLimiter\Limiter;
// Limit to 6 iterations per second
$limiter = new Limiter(6, 1);
for ($i = 0; $i < 50; $i++) {
$limiter->await();
echo "Iteration $i" . PHP_EOL;
}
As an alternate, I've (in the past) created a "cache" folder that stored the API calls so if I try to make the same call again, within a specific time range, it grabs from the cache first (more seamless) until it's okay to make a new call. May end up with archived information in the short term, but saves you from the API blocking you in the long term.
I liked mwp's answer and I wanted to convert it to OO to make me feel warm and fuzzy. I ended up drastically rewriting it to the point that it is totally unrecognizable from his version. So, here is my mwp-inspired OO version.
Basic explanation: Every time await is called, it saves the current timestamp in an array and throws out all old timestamps that arent relevant anymore (greater than the duration of the interval). If the rate limit is exceeded, then it calculates the time until it will be freed up again and sleeps until then.
Usage:
$limiter = new RateLimiter(4, 1); // can be called 4 times per 1 second
for($i = 0; $i < 10; $i++) {
$limiter->await();
echo microtime(true) . "\n";
}
I also added a little syntactic sugar for a run method.
$limiter = new RateLimiter(4, 1);
for($i = 0; $i < 10; $i++) {
$limiter->run(function() { echo microtime(true) . "\n"; });
}
<?php
class RateLimiter {
private $frequency;
private $duration;
private $instances;
public function __construct($frequency, $duration) {
$this->frequency = $frequency;
$this->duration = $duration;
$this->instances = [];
}
public function await() {
$this->purge();
$this->instances[] = microtime(true);
if($this->is_free()) {
return;
}
else {
$wait_duration = $this->duration_until_free();
usleep(floor($wait_duration));
return;
}
}
public function run($callback) {
if(!is_callable($callback)) {
return false;
}
$this->await();
$callback();
return true;
}
public function purge() {
$this->instances = RateLimiter::purge_old_instances($this->instances, $this->duration);
}
public function duration_until_free() {
return RateLimiter::get_duration_until_free($this->instances, $this->duration);
}
public function is_free() {
return count($this->instances) < $this->frequency;
}
public static function get_duration_until_free($instances, $duration) {
$oldest = $instances[0];
$free_at = $oldest + $duration * 1000000;
$now = microtime(true);
if($free_at < $now) {
return 0;
}
else {
return $free_at - $now;
}
}
public static function purge_old_instances($instances, $duration) {
$now = microtime(true);
$cutoff = $now - $duration;
return array_filter($instances, function($a) use ($duration, $cutoff) {
return $a >= $cutoff;
});
}
}
PHP source code to limit access to your API by allowing a request every 5 seconds for any user and using Redix.
Installing the Redis/Redix client :
composer require predis/predis
Download Redix (https://github.com/alash3al/redix/releases) depending on your operating system, then start the service :
./redix_linux_amd64
The following answer indicates that Redix is listening on ports 6380 for RESP protocol and 7090 for HTTP protocol.
redix resp server available at : localhost:6380
redix http server available at : localhost:7090
In your API, add the following code to the header :
<?php
require_once 'class.ratelimit.redix.php';
$rl = new RateLimit();
$waitfor = $rl->getSleepTime($_SERVER['REMOTE_ADDR']);
if ($waitfor>0) {
echo 'Rate limit exceeded, please try again in '.$waitfor.'s';
exit;
}
// Your API response
echo 'API response';
The source code for the script class.ratelimit.redix.php is :
<?php
require_once __DIR__.'/vendor/autoload.php';
Predis\Autoloader::register();
class RateLimit {
private $redis;
const RATE_LIMIT_SECS = 5; // allow 1 request every x seconds
public function __construct() {
$this->redis = new Predis\Client([
'scheme' => 'tcp',
'host' => 'localhost', // or the server IP on which Redix is running
'port' => 6380
]);
}
/**
* Returns the number of seconds to wait until the next time the IP is allowed
* #param ip {String}
*/
public function getSleepTime($ip) {
$value = $this->redis->get($ip);
if(empty($value)) {
// if the key doesn't exists, we insert it with the current datetime, and an expiration in seconds
$this->redis->set($ip, time(), self::RATE_LIMIT_SECS*1000);
return 0;
}
return self::RATE_LIMIT_SECS - (time() - intval(strval($value)));
} // getSleepTime
} // class RateLimit
I am trying to learn multithreading with PHP. I've installed PHP 7.2.14 with ZTS support, and looked over a lot of examples on the net, and afterwards, tried to create a simple script, to see if I understand the things that I've learned. The problem is, that it seems, I don't:)
Here's the script I've made:
class Task extends Threaded
{
private $workToBeDone;
public $DataHolder;
public function __construct($i, $z, $DataHolder)
{
$this->workToBeDone = array($i, $z);
$this->DataHolder = $DataHolder;
}
public function run()
{
$results = 0;
for ($i=$this->workToBeDone[0]; $i<=$this->workToBeDone[1]; $i++) {
$results++;
}
$this->synchronized(function ($DataHolder) use($results) {
echo $results . "\n";
$DataHolder->counter+=$results;
}, $this->DataHolder);
}
}
class MyDataHolder {
public $counter;
}
$DataHolder = new MyDataHolder;
$pool = new Pool(4);
$tasks = array();
for ($i = 0; $i < 15; ++$i) {
$Task = new Task(1,100, $DataHolder);
$pool->submit($Task);
}
while ($pool->collect());
$pool->shutdown();
echo "Total: " . $DataHolder->counter;
This script, should create 15 separate tasks, and in each task I would have to iterate a 100 times. After each 100 iterations are ready, I would like to store the number of times I've iterated in the MyDataHolder class, to be able to access it later.
The expected behaviour would be, that when I run this script, I would like to see 100 printed out 15 times on the screen, and in the end, I would like to see Total: 1500 printed out.
Instead of this, 100 is printed out 15 times, but the total value remains empty at the end.
What am I doing wrong? How can I collect the data from each of my threads, to use it later on in the program?
Let me give you similar examples,
<?php
/*
Threaded objects (which subsequently includes Volatile objects) are tied to the
context in which they are created. They can be used to fetch data from a thread,
but must be created in the outer most thread in which they are used.
*/
// create Threaded object in the main thread
class DataHolder extends Threaded{
public $counter;
public function __construct(){
$this->counter = 0;
}
}
class threading extends Thread {
public $store;
public $workdone;
public function __construct(Threaded $store, $i, $z)
{
$this->store = $store;
$this->workdone = array($i, $z);
}
public function run()
{
/*
The following array cast is necessary to prevent implicit coerci
on to a
Volatile object. Without it, accessing $store in the main thread
after
this thread has been destroyed would lead to RuntimeException of
:
"pthreads detected an attempt to connect to an object which has
already
been destroyed in %s:%d"
See this StackOverflow post for additional information:
https://stackoverflow.com/a/44852650/4530326
*/
$rescount = 0;
for($i = $this->workdone[0]; $i <= $this->workdone[1]; $i++){
$rescount++;
}
$this->store['counter'] += $rescount;
}
}
$store = new DataHolder();
$pool = new Pool(3);
for($i = 0; $i < 15; $i++){
$task = new threading($store, 1, 100);
$pool->submit($task);
}
$pool->shutdown();
print_r($store);
?>
This example I tried to modify from examples in github.
Sorry I don't fully understand the flow about Threaded objects, and class Thread. Figure yourself about work flow by your own perspective.
The objective is to continually collect data of the current temperature. But a separate process should analyse the output of that data because I have to tweak the algorithm a lot but I want to avoid downtime so stopping the process is a no-go.
The problem is when I separate these processes, that process 2 would either continually have to make calls to the database or read from a local file to do something with the output generated by 1 process but I want to act upon it immediately and that is expensive in terms of resources.
Would it be possible to reload the class into memory somehow when the file changes by for example writing a function that keeps calculating the MD5 of the file, and if it changes than reload the class somehow? So this separate class should act as a plugin. Is there any way to make that work?
Here is a possible solution. Use Beanstalk (https://github.com/kr/beanstalkd).
PHP class to talk to Beanstalk (https://github.com/pda/pheanstalk)
Run beanstalk.
Create a process that goes into an infinite loop that reads from a Beanstalk queue. (Beanstalk queues are called "Tubes"). PHP processes are not meant to be run for a very long time. The main reason for this is memory. The easiest way to do handle this is to restart the process every once in a while or if memory gets to a certain threshold.
NOTE: What I do is to have the process exit after some fixed time or if it uses a certain amount of memory. Then, I use Supervisor to restart it.
You can put data into Beanstalk as JSON and decode it on the receiving end. The sending and receiving processes need to agree on that format. You could store your work payload in a database and just send the primary key in the queue.
Here is some code you can use:
class BeanstalkClient extends AbstractBaseQueue{
public $queue;
public $host;
public $port;
public $timeout;
function __construct($timeout=null) {
$this->loadClasses();
$this->host = '127.0.0.1';
$this->port = BEANSTALK_PORT;
$this->timeout = 30;
$this->connect();
}
public function connect(){
$this->queue = new \Pheanstalk\Pheanstalk($this->host, $this->port);
}
public function publish($tube, $data, $delay){
$payload = $this->encodeData($data);
$this->queue->useTube($tube)->put($payload,
\Pheanstalk\PheanstalkInterface::DEFAULT_PRIORITY, $delay);
}
public function waitForMessages($tube, $callback=null){
if ( $this->timeout ) {
return $this->queue->watchOnly($tube)->reserve($this->timeout);
}
return $this->queue->watchOnly($tube)->reserve();
}
public function delete($message){
$this->queue->delete($message);
}
public function encodeData($data){
$payload = json_encode($data);
return $payload;
}
public function decodeData($encodedData) {
return json_decode($encodedData, true);
}
public function getData($message){
if ( is_string($message) ) {
throw new Exception('message is a string');
}
return json_decode($message->getData(), true);
}
}
abstract class BaseQueueProcess {
protected $channelName = ''; // child class should set this
// The queue object
public $queue = null;
public $processId = null; // this is the system process id
public $name = null;
public $status = null;
public function initialize() {
$this->processId = getmypid();
$this->name = get_called_class();
$this->endTime = time() + (2 * 60 * 60); // restart every hour
// seconds to timeout when waiting for a message
// if the process isn't doing anything, timeout so they have a chance to do housekeeping.
$queueTimeout = 900;
if ( empty($this->queue) ) {
$this->queue = new BeanstalkClient($queueTimeout);
}
}
public function receiveMessage($queueMessage) {
$taskData = $this->queue->getData($queueMessage);
// debuglog(' Task Data = ' . print_r($taskData, true));
if ( $this->validateTaskData($taskData) ) {
// process the message
$good = $this->didReceiveMessage($taskData);
if ( $good !== false ) {
// debuglog("Completing task {$this->taskId}");
$this->completeTask($queueMessage);
}
else {
$this->failTask($queueMessage);
}
}
else {
// Handle bad message
$this->queue->delete($queueMessage);
}
}
public function run() {
$this->processName = $this->channelName;
// debuglog('Start ' . $this->processName);
// debuglog(print_r($this->params, true));
while(1) {
$queueMessage = $this->queue->waitForMessages($this->channelName);
if ( ! empty($queueMessage) ) {
$this->receiveMessage($queueMessage);
}
else {
// empty message
// a timeout
// // debuglog("empty message " . get_called_class());
}
$memory = memory_get_usage();
if( $memory > 20000000 ) {
// debuglog('Exit '.get_called_class().' due to memory. Memory:'. ($memory/1024/1024).' MB');
// Supervisor will restart process.
exit;
}
elseif ( time() > $this->endTime ) {
// debuglog('Exit '.get_called_class().' due to time.');
// Supervisor will restart process.
exit;
}
usleep(10);
}
}
public function completeTask($queueMessage) {
//
$this->queue->delete($queueMessage);
}
public function failTask($queueMessage) {
//
$this->queue->delete($queueMessage);
}
}
class MyProcess extends BaseQueueProcess {
public function initialize() {
$this->channelName = 'Temperature';
parent::initialize();
}
public function didReceiveMessage($taskData) {
// debuglog(print_r($taskData, true));
// process data here
// return false if something went wrong
return true;
}
}
//Sender
class WorkSender {
const TubeName = 'Temperature';
const TubeDelay = 0; // Set delay to 0, i.e. don't use a delay.
function send($data) {
$c = BeanstalkClient();
$c->publish(self::TubeName, $data, self::TubeDelay);
}
}
All,
Background of how problem was detected
My question concerns the performance of a web app, mainly the index page. I noticed the problem when I was giving a demonstration at a local branch of my company that has slow internet (I don't know the exact speeds, or ping rate) judged by the fact that Google took about 10 seconds to load. My index page took ~10-20 times longer to load. I was under the assumption that my app did most of the work on the server side (as php is making all of the database queries...). But this led me to look at the network tool of Chrome and see the latency times of these 4 divs being loaded by ajax (I'll elaborate in a bit). Interestingly, the scripts being called appear to run sequentially, but not necessarily in the order I invoked the ajax calls (sometimes they do, other times they don't).
What are these divs / ajax requests?
Here is a code snippets of a request:
Yii::app()->clientScript->registerScript('leftDiv', '
$( "#left_dash" ).load(
"'.$this->createUrl("/site/page?view=leftDashLoad") .'",
function(){
$("#left_dash p a").click(function() {
$(this).parent().parent().find("div.scroll100").slideUp();
$(this).parent().next().stop(false, false).slideDown();
});
$("p:first-child").next().slideDown();
}
);
' );
Here is the page requested:
$this->widget('widgets.ScrollList',array(
'condition'=>
function($muddJob,$scrollList)
{
$job = $muddJob->Job;; //returns a job or empty array
if(!empty($job) )
{
if( $muddJob->uploadArtwork == null && $muddJob->uploadData == null ) {
array_push($scrollList->_models,$job);
$scrollList->columnValues = array($muddJob->jobDescription,$muddJob->dropDate1);
return true;
}
}
return false;
},
'columns' => array('col1'=>"MuddJob#",'col2'=>"Desc",'col3'=>"Dealer Name"),
'name'=> "Print New Ticket",
'muddJobs' => $currentExchanges->getCurrentMuddExchanges(),
)
);
Imagine that page (the page that ajax has called) having 6 similar declarations that create widgets. The goal is to return html to put back in place of a loading gif on the index page.
Here is the scroll widget:
<?php
Yii::import('widgets.ScrollListBase');
include_once Yii::app()->extensionPath . "/BusinessDay.php";
class ScrollList extends ScrollListBase
{
private $_content;
public $columns = array();
public $columnValues;
private $_listInfo;
public $name;
public $_models = array();
public $condition;
public $muddJobs; //object to pass
public $jobsMailingTodayArray = array();
public function init()
{
//$this->init();
$this->_listInfo = $this->generateListInfo($this->columns);
//$muddJobs = $this->getCurrentMuddExchanges();
$listInfo = $this->newScrollList($this->muddJobs);
$contents = $this->createContent($listInfo,$this->name);
$this->_content = $contents[0];
// $this->_fullTableContent = $contents[1];
//$this->_listInfo = $contents[2];
}
public function run()
{
//if($this->data['isVisible'])
echo $this->_content;
Yii::app()->session["exploded_content_{$this->name}"] = $this->_models;
}
private function newScrollList($muddJobs)
{
$listInfo = $this->_listInfo;
$tempCount = 0;
foreach($muddJobs as $muddJob)
{
$condition = $this->condition;
if($condition($muddJob,$this) && empty($this->jobsMailingTodayArray) ) //if no job exists for the muddExchange...
{
$tempArray = $this->createWidgetLinks($tempCount,$listInfo,$muddJob,$this->columnValues);
$listInfo = $tempArray[0];
$tempCount = $tempArray[1];
}
elseif ( !empty($this->jobsMailingTodayArray ) )
{
foreach ($this->jobsMailingTodayArray as $jobMailingToday) //change to for loop over the length of the jobsMailingToday
{
$tempArray = $this->createWidgetLinks($tempCount,$listInfo,$muddJob,$this->columnValues);
$listInfo = $tempArray[0];
$tempCount = $tempArray[1];
}
$this->jobsMailingTodayArray = array();
}
}
return array($listInfo,$tempCount);
}
}
?>
Here is it's parent:
<?php
class ScrollListBase extends CWidget
{
private $content = "<p>";
private $divDeclaration = "<div class='scroll100'>\n<table class='quickInfoTable'>\n<thead>\n";
private $headTag = "<th>";
private $headTagClose = "</th>\n";
private $theadTagClose = "</thead>\n";
private $bodyTag = "<tbody>\n";
private $listInfo = "<div class='scroll100'>\n<table class='quickInfoTable'>\n<thead>\n<th>Job#</th>\n<th>Package#</th>\n<th>Entry Date</th>\n</thead>\n<tbody>\n";
/**
* Initializes the widget.
*/
public function createContent($listInfo,$name)
{
$largeHref = Yii::app()->request->baseUrl . '/index.php/site/fullTableView';
$this->content .= "<span class='badge' >{$listInfo[1]} </span> <a href='#'>{$name} </a> <a href='$largeHref/Name/{$name}'> <small>(view larger)</small> </a> </p>";
if( $listInfo[1] > 0 )
{
// $this->fullTable .= substr($listInfo[0],22);
// $this->fullTableContent= $this->fullContent .= $this->fullTable . "</tbody>\n</table>\n</div>";
$this->content .= $listInfo[0] . "</tbody>\n</table>\n</div>";
}
return array($this->content);
}
//Helper Methods
/**
*
* #param type $attributeArray. send an accociative array
* #return type = either a job or an empty array
*/
protected function getJobByAttributes($attributeArray)
{
return Jobs::model()->with('MuddExchange')->findByAttributes($attributeArray);
}
protected function createWidgetLinks($tempCount,$listInfo,$muddJob,$columnValues,$url="/MuddExchange/")
{
$tempCount++;
$viewIndex = $muddJob->exchange_id;
$model = $muddJob;
$job = $muddJob->Job;
if ( isset($job ))
{
$model = $job;
$url = "/Jobs/";
$viewIndex = $model->job_id;
}
$link = CHtml::link("$model->jobNumber",array("{$url}{$viewIndex}"));
$listInfo .= "<tr>\n<td>$link</td>\n";
foreach ($columnValues as $columnValue)
{
$listInfo .= "<td>{$columnValue}</td>\n";
}
$listInfo .= "</tr>";
return array($listInfo,$tempCount);
}
protected function getListInfo()
{
return $this->listInfo;
}
/**
* Takes an array of strings to generate the column names for a particular list.
* #param array $heads
* #return string
*
*/
protected function generateListInfo($heads)
{
//<th>Job#</th>\n<th>Package#</th>\n<th>Entry Date</th>\n</thead>\n<tbody>\n";
$htmlScrollStart = $this->divDeclaration;
foreach ($heads as $tableColumn => $name)
{
$htmlScrollStart .= $this->headTag . $name . $this->headTagClose;
}
$htmlScrollStart .= $this->theadTagClose . $this->bodyTag;
return $htmlScrollStart;
}
public function calculateDueDate($jobsMailDate,$job)
{
$package = PackageSchedule::model()->findByAttributes(array('package_id'=>$job->packageID));
$projectedDays = $package->projected_days_before_mail_date;
$dropDate1 = $jobsMailDate->projected_mail_date;
$dropDate = wrapBusinessDay($dropDate1); //use this for actual command...
$toSec = 24*60*60;
$dayInt =0;
$secDropDate = strtotime($dropDate1);
do{
$dayInt +=1;
$daysInSec = ($dayInt) * $toSec ;
$secGuessDueDate = $secDropDate - $daysInSec;
$dueDate = date('Y-m-d',$secGuessDueDate);
$difference = $dropDate->difference($dueDate);
}while( $difference != $projectedDays);
return $dueDate;
}
}
?>
Why I think this behavior is odd
The whole slow internet thing is a beast in and of itself, but I don't think that is in the scope of StackOverflow. I'm more concerned about the loading of these divs. The div that loads last, i.e., takes on average 1.5 to 2 seconds, is an ajax request to a page that creates a single widget. The logic behind it is here:
<?php
include_once Yii::app()->extensionPath . "/CurrentExchanges.php";
$currentExchanges = Yii::app()->session['currentExchanges'];
$this->layout = 'barebones';
$this->widget('widgets.ScrollList',array(
'condition'=>
function($muddJob,$scrollList)
{
if ($muddJob->dropDate1 != null && $muddJob->dropDate1 != '0000-00-00')
{
$job = $muddJob->Job;;
if(!empty($job) && $job->packageID != null) //if job exists for the muddExchange and has a package
{
if($job->uploadArtwork == null )
{
$jobsMailDate = JobsMailDate::model()->findByAttributes(array("job_id"=>$job->job_id,'sequence_num'=>1));
if(!empty($jobsMailDate))
{
$calculatedDueDate = $scrollList->calculateDueDate($jobsMailDate,$job);
if (strtotime($calculatedDueDate) <= strtotime(date("Y-m-d")) )
{
array_push($scrollList->_models , $job);
$scrollList->columnValues = array($muddJob->jobDescription,$muddJob->dropDate1,$jobsMailDate->projected_mail_date);
return true;
}
}
}
}
}
return false;
},
'columns' => array('col1'=>"MuddJob#",'col2'=>"Desc",'col3'=>"Drop Date", 'col4' =>'Projected Drop Date'),
'name'=> "Artwork Due Today",
'muddJobs' => $currentExchanges->getCurrentMuddExchanges(),
)
);
?>
The calculateduedate method makes 2 additional calls to the server.
What I'm failing to understand is why the left div (with the most proccessing to do) is usually the first to return and the artLoad is usually the last to load (by a substantial difference). Here are some times returned by chromes network tool:
leftDashLoad: 475ms
rightDashLoad: 593ms
dataLoad: 825ms
artLoad: 1.41s
dataLoad: 453ms
rightDashLoad: 660ms
leftDashLoad: 919ms
artLoad: 1.51s
rightDashLoad: 559ms
leftDashLoad: 1.17s
dataLoad: 1.65s
artLoad: 2.01s
I just can't fathom why the left/right dashloads return so much faster than the artLoad. The code for artLoad and dataLoad are nearly identical save the actual comparison (the one if statement). If this were truly asynchronous, I'd expect the order to be art/dataLoad, rightDashLoad and leftDashLoad based purely on the amounts of computation done in each page. Perhaps the server isn't multithreading, or there is some weird configuration, but if that were the case, I don't see why the effects of the loading would be hit so hard by slow internet.
If I have overlooked something obvious or failed to use google appropriately, I do apologize. Thanks for any help you can offer!
Language/other tech info
The app was developed using the Yii framework. PHP 5.3. Apache Server. INNODB Tables. Server is hosted in dreamhost.
Update
I've changed the view page so that the ajax calls are now calling a controller action. It seems to have made the loading times more similar (asynchronous?) on my local dev environment, but really slowed it down on the QA environment (hosted on dreamhost...). Here is screen shot of the local network tools info:
dev environment
and the qa site (note, that the databases have about the same amounts of data...)
qa environment
Thoughts? It seems to me that my original problem may be solved, as the return times (on my local dev) look more like I expect them to.
Also, my own ignorance as to how NetBeans debugs was playing a part in this synchronous loading thing as xdebug is using a session. I believe this was forcing the ajax calls to wait their turn.
Thanks to #Rowan for helping me diagnose this strange behavior. PHP was trying to request a session before the session was closed in hopes to prevent a race hazard. There were session requests in my code, and there was a session started by my IDE (NetBeans). Removing all references to sessions in the ajax called pages and having ajax call actions that use renderPartial() proved to return the expected behavior (and much better code IMO). Let me know how to properly thank users in terms of StackOverflow (can I up comments, or what is there available? ). Thank you all!
I'm dealing with Godaddy auction domains, they provide some way to download domains listing. I do have a cron job developed to download & dump (insert) domains listing into my database table. This process takes few seconds from download and dumping into database. The total number of domains (records) in this case are 34000 entries.
Second, I need to update the page rank for each individual domain in database for total 34000 records. I have the PHP API for fetching the page rank live. The Godaddy downloads don't provide page rank detail so I have to fetch and update it separately.
Now, the problem is when it comes to fetching page rank live and then updating page rank into database takes too much time for total 34000 domains.
I recently did an experiment via cron job to update page rank for domains in database, it took 4 hours to update page rank just for 13383 domains from 34000 total. Since it has to first fetch and then update into database. This all was going on dedicated server.
Is there any way to speed up this process for large number of domains? The only way, I'm thinking is to accomplish this via multitasking.
Would that be possible to have 100 tasks fetching page rank and updating it into database simultaneously?
In case you need the code:
$sql = "SELECT domain from auctions";
$mozi_get=runQuery($sql);
while($results = mysql_fetch_array($mozi_get)){
/* PAGERANK API*/
if($results['domain']!='Featured Listings'){
//echo $results['domain']."<br />";
try
{
$url = new SEOstats("http://www.".trim($results['domain']));
$rank=$url->Google_Page_Rank();
if(!is_integer($rank)){
//$rank='0';
}
}
catch (SEOstatsException $e)
{
$rank='0';
}
try
{
$url = new SEOstats(trim("http://".$results['domain']));
$rank_non=$url->Google_Page_Rank();
if(!is_integer($rank_non)){
//$rank_non='0';
}
}
catch (SEOstatsException $e)
{
$rank_non='0';
}
$sql = "UPDATE auctions set rank='".$rank."', rank_non='".$rank_non."' WHERE domain='".$results['domain']."'";
runQuery($sql);
echo $sql."<br />";
}
}
Here is my updated code for pthreads:
<?php
set_time_limit(0);
require_once("database.php");
include 'src/class.seostats.php';
function get_page_rank($domain) {
try {
$url = new SEOstats("http://www." . trim($domain));
$rank = $url->Google_Page_Rank();
if(!is_integer($rank)){
$rank = '0';
}
} catch (SEOstatsException $e) {
$rank = '0';
}
return $rank;
}
class Ranking extends Worker {
public function run(){}
}
class Domain extends Stackable {
public $name;
public $ranking;
public function __construct($name) {
$this->name = $name;
}
public function run() {
$this->ranking = get_page_rank($this->name);
/* now write the Domain to database or whatever */
$sql = "UPDATE auctions set rank = '" . $this->ranking . "' WHERE domain = '" . $this->name . "'";
runQuery($sql);
}
}
/* start some workers */
$workers = array();
while (#$worker++ < 8) {
$workers[$worker] = new Ranking();
$workers[$worker]->start();
}
/* select auctions and start processing */
$domains = array();
$sql = "SELECT domain from auctions"; // RETURNS 55369 RECORDS
$domain_result = runQuery($sql);
while($results = mysql_fetch_array($domain_result)) {
$domains[$results['domain']] = new Domain($results['domain']);
$workers[array_rand($workers)]->stack($domains[$results['domain']]);
}
/* shutdown all workers (forcing all processing to finish) */
foreach ($workers as $worker)
$worker->shutdown();
/* we now have ranked domains in memory and database */
var_dump($domains);
var_dump(count($domains));
?>
Any help will be highly appreciated. Thanks
Well here's a pthreads example that will allow you to multi-thread your operations ... I have chosen the worker model and am using 8 workers, how many workers you use depends on your hardware and the service receiving the requests ... I've never used SEOstats or godaddy domain auctions, I'm not sure of the CSV fields and will leave the getting of page ranks to you ...
<?php
define ("CSV", "https://auctions.godaddy.com/trpSearchResults.aspx?t=12&action=export");
/* I have no idea how to get the actual page rank */
function get_page_rank($domain) {
return rand(1,10);
}
class Ranking extends Worker {
public function run(){}
}
class Domain extends Stackable {
public $auction;
public $name;
public $bids;
public $traffic;
public $valuation;
public $price;
public $ending;
public $type;
public $ranking;
public function __construct($csv) {
$this->auction = $csv[0];
$this->name = $csv[1];
$this->traffic = $csv[2];
$this->bids = $csv[3];
$this->price = $csv[5];
$this->valuation = $csv[4];
$this->ending = $csv[6];
$this->type = $csv[7];
}
public function run() {
/* we convert the time to a stamp here to keep the main thread moving */
$this->ending = strtotime(
$this->ending);
$this->ranking = get_page_rank($this->name);
/* now write the Domain to database or whatever */
}
}
/* start some workers */
$workers = array();
while (#$worker++ < 8) {
$workers[$worker] = new Ranking();
$workers[$worker]->start();
}
/* open the CSV and start processing */
$handle = fopen(CSV, "r");
$domains = array();
while (($line = fgetcsv($handle))) {
$domains[$line[0]] = new Domain($line);
$workers[array_rand($workers)]->stack(
$domains[$line[0]]);
}
/* cleanup handle to csv */
fclose($handle);
/* shutdown all workers (forcing all processing to finish) */
foreach ($workers as $worker)
$worker->shutdown();
/* we now have ranked domains in memory and database */
var_dump($domains);
var_dump(count($domains));
?>
Questions:
Right, 8 workers
Workers execute Stackable objects in the order they were stack()'d, this line chooses a random worker to execute the Stackable
You can traverse the list of $domains in the main process during execution, checking the status of each Stackable as you are executing
All of each workers stack will be executed before the shutdown takes place, the shutdown ensures that all work is therefore done by that point in the execution of the script.