This may seem a silly question, but having been pondering it for a few days, i've only come up with one answer, so I thought i'd throw it open and see what other people think.
I have a PHP script which runs as a CRON job, and checks that our CDN is up and running. If it is, it needs to set a flag to show 'true', if not to show 'false'. The main PHP webpages then need to check this flag everytime a user visits the site, and set the CDN URL variable appropriately.
The query is where do we store this flag? We could put it in a database, but then the database would get 'hit' every page load. Also, as we're using Memcache it could result in a delay to it being updated (although there are ways around that, obviously).
My next thought was to save it into a simple text file on the server. But a) we have load-balanced servers, and although we could run the same CRON job on each server, that seems inefficient, and b) opening and reading a text file could, i thought, slow the page load down, as its involving disk activity everytime - that isn't good when the system is delivering 3 million web pages a week!
The ideal would be PHP system variable that retains its value between pages - but of course, that doesn't exist! Its not the kind of information that should be stored as a cookie on the users machine either.
Does any one have any thoughts? All comments welcome!
If you are already using Memcache, then why do you need the CronJob at all?
Put the check for the CDN into your regular application code. Cache the result in Memcache with a lifetime of what your Cron Interval is right now. If the cache is stale, do the full checking on the CDN, otherwise return the cached result.
If you want to keep the CRON job and are concerned about Disk I/O, consider using a RAM disk or Semaphores or Shared Memory to store the results. But it seems kinda pointless when you are already using memcache.
Do you already use a database connection?
If so, use that; you can fit it into your existing replication mechanism pretty easily, I'm sure.
If not, then whatever you do here is going to add overhead, be it reading a text file on disk or reading from a database where you weren't before.
Related
As all my requests goes through an index script, I tried to time the respond time of all my requests.
Its simply the difference between the start time (start of the script) and end time (end of the script).
As I cache my data on memcached and user are all served using memcached.
I mostly get less than a second respond time but at times there's wierd spike of more than a seconds. the worse case can go up to 200+ seconds.
I was wondering if mobile users had a slow connection, does that reflect on my respond time?
I am serving primary mobile users.
Thanks!
No, it's the runtime of your script. It does not count the latency to the user, that's something the underlying web server is worrying about. Something in your script just takes very long. I recommend you profile your script to find what that is. Xdebug is a good way to do so.
If you're measuring in PHP (which it sounds like you are), thats the time it takes for the page to be generated on the server side, not the time it takes to be downloaded.
Drop timers in throughout the page, and try and break it down to a section that is causing the huge delay of 200+ seconds.
You could even add a small script that will email you details of how long each section took to load if it doesn't happen often enough to see it yourself.
It could be that the script cannot finish because a client downloads the results very-very slowly. If you don't use a front-end server like nginx, the first thing to do is to try it.
Someone already mentioned xdebug, but normally you would not want to run xdebug in production. I would suggest using xhprof to profile pages on development/staging/production. You can turn on xhprof conditionally, which makes it really easy to run on production.
I'm currently running a Linux based VPS, with 768MB of Ram.
I have an application which collects details of domains and then connect to a service via cURL to retrieve details of the pagerank of these domains.
When I run a check on about 50 domains, it takes the remote page about 3 mins to load with all the results, before the script can parse the details and return it to my script. This causes a problem as nothing else seems to function until the script has finished executing, so users on the site will just get a timer / 'ball of death' while waiting for pages to load.
**(The remote page retrieves the domain details and updates the page by AJAX, but the curl request doesnt (rightfully) return the page until loading is complete.
Can anyone tell me if I'm doing anything obviously wrong, or if there is a better way of doing it. (There can be anything between 10 and 10,000 domains queued, so I need a process that can run in the background without affecting the rest of the site)
Thanks
A more sensible approach would be to "batch process" the domain data via the use of a cron triggered PHP cli script.
As such, once you'd inserted the relevant domains into a database table with a "processed" flag set as false, the background script would then:
Scan the database for domains that aren't marked as processed.
Carry out the CURL lookup, etc.
Update the database record accordingly and mark it as processed.
...
To ensure no overlap with an existing executing batch processing script, you should only invoke the php script every five minutes from cron and (within the PHP script itself) check how long the script has been running at the start of the "scan" stage and exit if its been running for four minutes or longer. (You might want to adjust these figures, but hopefully you can see where I'm going with this.)
By using this approach, you'll be able to leave the background script running indefinitely (as it's invoked via cron, it'll automatically start after reboots, etc.) and simply add domains to the database/review the results of processing, etc. via a separate web front end.
This isn't the ideal solution, but if you need to trigger this process based on a user request, you can add the following at the end of your script.
set_time_limit(0);
flush();
This will allow the PHP script to continue running, but it will return output to the user. But seriously, you should use batch processing. It will give you much more control over what's going on.
Firstly I'm sorry but Im an idiot! :)
I've loaded the site in another browser (FF) and it loads fine.
It seems Chrome puts some sort of lock on a domain when it's waiting for a server response, and I was testing the script manually through a browser.
Thanks for all your help and sorry for wasting your time.
CJ
While I agree with others that you should consider processing these tasks outside of your webserver, in a more controlled manner, I'll offer an explanation for the "server standstill".
If you're using native php sessions, php uses an exclusive locking scheme so only a single php process can deal with a given session id at a time. Having a long running php script which uses sessions can certainly cause this.
You can search for combinations of terms like:
php session concurrency lock session_write_close()
I'm sure its been discussed many times here. I'm too lazy to search for you. Maybe someone else will come along and make an answer with bulleted lists and pretty hyperlinks in exchange for stackoverflow reputation :) But not me :)
good luck.
I'm not sure how your code is structured but you could try using sleep(). That's what I use when batch processing.
i have a module called online events in my website, this module will run for every 8 sec and fetches a recent activities in my websites. So, if more than 100 users are using my website at a time, it consumes more memory and more cpu usuage. what should i do to reduce cpu usage and memory, provided it should not affect a online events module. will it be would to do something with files here? pls help me.
If I understand it correctly the module needs to run every 8 seconds?
Why not store the time when the module ran last and then with every access from those thousands of users you only check if the time since the module last ran is bigger than 8 seconds? If it is you ran the module and store thwe new time, if it isn't you do nothing and wait for another user to access the site, to generate new online events for him?
Caching
http://www.mnot.net/cache_docs/
http://memcached.org/
http://php.net/manual/en/book.apc.php
Database optimization
http://dev.mysql.com/doc/refman/5.0/en/optimization.html
http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
http://forge.mysql.com/wiki/Top10SQLPerformanceTips
Apache optimization
http://www.prelovac.com/vladimir/wordpress-optimization-guide (About Wordpress, but has good tips for all applications)
Load balancing
http://httpd.apache.org/docs/2.1/mod/mod_proxy_balancer.html
http://www.howtoforge.com/high_availability_loadbalanced_apache_cluster
http://wiki.nginx.org/NginxHttpUpstreamModule
At first, make the interval higher. Then, cache the module somehow on server side. E.g. make a static html file that gets loaded through a php file, and that will only get updated if something has changed in the db, so you only need to query the DB and render the data from it if really something happened. A simple approach to ensure cache update would be to remove the static file everytime something changes in the DB. This only makes sense if there are not too many DB changes.
I have a scenario where one PHP process is writing a file about 3 times a second, and then several PHP processes are reading this file.
This file is esentially a cache. Our website has a very insistent polling, for data that changes constantly, and we don't want every visitor to hit the DB every time they poll, so we have a cron process that reads the DB 3 times per second, processes the data, and dumps it to a file that the polling clients can then read.
The problem I'm having is that, sometimes, opening the file to write to it takes a long time, sometimes even up to 2-3 seconds. I'm assuming that this happens because it's being locked by reads (or by something), but I don't have any conclusive way of proving that, plus, according to what I understand from the documentation, PHP shouldn't be locking anything.
This happens every 2-5 minutes, so it's pretty common.
In the code, I'm not doing any kind of locking, and I pretty much don't care if that file's information gets corrupted, if a read fails, or if data changes in the middle of a read.
I do care, however, if writing to it takes 2 seconds, esentially, because the process that has to happen thrice a second now skipped several beats.
I'm writing the file with this code:
$handle = fopen(DIR_PUBLIC . 'filename.txt', "w");
fwrite($handle, $data);
fclose($handle);
And i'm reading it directly with:
file_get_contents('filename.txt')
(it's not getting served directly to the clients as a static file, I'm getting a regular PHP request that reads the file and does some basic stuff with it)
The file is about 11kb, so it doesn't take a lot of time to read/write. Well under 1ms.
This is a typical log entry when the problem happens:
Open File: 2657.27 ms
Write: 0.05984 ms
Close: 0.03886 ms
Not sure if it's relevant, but the reads happen in regular web requests, through apache, but the write is a regular "command line" PHP execution made by Linux's cron, it's not going through Apache.
Any ideas of what could be causing this big delay in opening the file?
Any pointers on where I could look to help me pinpoint the actual cause?
Alternatively, can you think of something I could do to avoid this? For example, I'd love to be able to set a 50ms timeout to fopen, and if it didn't open the file, it just skips ahead, and lets the next run of the cron take care of it.
Again, my priority is to keep the cron beating thrice a second, all else is secondary, so any ideas, suggestions, anything is extremely welcome.
Thank you!
Daniel
I can think of 3 possible problems:
the file get locked when read/writing to it by lower php system calls without you knowing it. This should block the file for 1/3s max. You get periods longer that that.
the fs cache starts a fsync() and the whole system blocks read/writes to the disk until that is done. As a fix you may try installing more RAM or upgrading the kernel or using a faster hard disk
your "caching" solution is not distributed, and it hits the worst acting piece of hardware in the whole system many times a second... meaning that you cannot scale it further by simply adding more machines, only by increasing the hdd speed. You should take a look at memcache or APC, or maybe even shared memory http://www.php.net/manual/en/function.shm-put-var.php
Solutions i can think of:
put that file in a ramdisk http://www.cyberciti.biz/faq/howto-create-linux-ram-disk-filesystem/ . This should be the simplest way to avoid hitting the disk so often, without other major changes.
use memcache. Its very fast when used locally, it scales well, used by "big" players. http://www.php.net/manual/en/book.memcache.php
use shared memory. It was designed for what you are trying to do here... having a "shared memory zone"...
change the cron scheduler to update less, maybe implement some kind of event system, so it would only update the cache when necessary, and not on a time basis
make the "writing" script write to 3 files different, and make the "readers" read from one of the files, randomly. This may allow a "distribution" of the loking across more files and it may reduce the chances that a certain file is locked when writing to it... but i doubt it would bring any noticeable benefits.
You should use something really fast solution if you want to guarantee constant low open times. Maybe your OS is doing disk syncs, database file commits, or other things that you can not work around.
I suggest using memcached, redis, or even mongoDB for such tasks. You might even write your own caching daemon, even in php (however this is totally unnecessary, and can be tricky).
If you are absolutely, positively sure that you can only solve this task by this file cache, and you are under Linux, try to use different disk I/O scheduler, like deadline, OR (cfq AND decrease PHP process priority to -3 / -4).
That question may appear strange.
But every time I made PHP projects in the past, I encountered this sort of bad experience:
Scripts cancel running after 10 seconds. This results in very bad database inconsistencies (bad example for an deleting loop: User is about to delete an photo album. Album object gets deleted from database, and then half way down of deleting the photos the script gets killed right where it is, and 10.000 photos are left with no reference).
It's not transaction-safe. I've never found a way to do something securely, to ensure it's done. If script gets killed, it gets killed. Right in the middle of a loop. It gets just killed. That never happened on tomcat with java. Java runs and runs and runs, if it takes long.
Lot's of newsletter-scripts try to come around that problem by splitting the job up into a lot of packages, i.e. sending 100 at a time, then relading the page (oh man, really stupid), doing the next one, and so on. Most often something hangs or script will take longer than 10 seconds, and your platform is crippled up.
But then, I hear that very big projects use PHP like studivz (the german facebook clone, actually the biggest german website). So there is a tiny light of hope that this bad behavior just comes from unprofessional hosting companies who just kill php scripts because their servers are so bad. What's the truth about this? Can it be configured in such a way, that scripts never get killed because they take a little longer?
Is PHP suitable for very large projects?
Whenever I see a question like that, I get a bit uneasy. What does very large mean? What may be large to you, may be small to me or vice versa. And that is even assuming that we use the same metric. Are you measuring time to build the project, complete life-cycle of the project, money that are involved, number of people using it, number of developers to build/maintain it, etc. etc.
That said, the problems you're describing sounds like you don't know your technology good enough. That would be a problem for you regardless of which technology you picked. For example, use database transactions to ensure atomicity. And use asynchronous offline jobs to process long running tasks (Such as dispatching a mailing list).
A lot if the bad behaviour is covered in good frameworks like the Zend Framework.
Anything that takes longer the 10 seconds is really messed up but you can always raise the execution time with http://de3.php.net/set_time_limit
A lot of big sites are writen in PHP: Facebook, Wikipedia, StudiVZ, Digg.com etc.. a lot of the things you are talking about are just configuration things maybe you should look into that?
Are you looking for set_time_limit() and ignore_user_abort()?
Performance is not a feature you can just throw in after most of the site is done.
You have to design the site for heavy load.
If a database task is normally involving 10K rows, you should be prepared not just the execution time issues, but other maintenance questions.
Worst case: make a consistency tool to check and fix those errors.
Better: instead of phisically delete the images, just flag them and let background services to take care of the expensive maneuvers.
Best: you can utilize a job queue service and add this job to the queue.
If you do need to do transactions in php, you can just do:
mysql_query("BEGIN");
/// do your queries here
mysql_query("COMMIT");
The commit command will just complete the transaction.
If any errors occur, you can just rollback with:
mysql_query("ROLLBACK");
Edit: Note this will only work if you are using a database that supports transactions, such as InnoDB
You can configure how much time is allowed for executing a script, either in the php.ini setting or via ini_set/set_time_limit
Instead of studivz (the German Facebook clone), you could look at the actual Facebook which is entirely PHP. Or Digg. Or many Yahoo sites. Or many, many others.
ignore_user_abort is probably what you're looking for, but you could also add another layer in terms of scheduled maintenance jobs. They basically run on a specified interval and do various things to make sure your data/filesystem are in a state that you want... deleting old/unlinked files is just one of many things you can do.
For these large loops like deleting photo albums or sending 1000's of emails your looking for ignore_user_abort and set_time_limit.
Something like this:
ignore_user_abort(true); //users leaves webpage will not kill script
set_time_limit(0); //script can take as long as it wants
for(i=0;i<10000;i++)
costly_very_important_operation();
Be carefull however that this could potentially run the script forever:
ignore_user_abort(true); //users leaves webpage will not kill script
set_time_limit(0); //script can take as long as it wants
while(true)
do_something();
That script will never die, unless you restart your server.
Therefore it is best to never set the time_limit the 0.
Technically no programming language is transaction safe, it's the database that needs to be transaction safe. So if the script/code running dies or disconnects, for whatever reason, the transaction will be rolled back.
Putting queries in a loop is a very bad idea unless it is specifically design to be running in batches and breaking a much larger set into smaller pieces. Adjusting PHP timers and limits is generally a stop gap solution, you are still dependent on the client browser if using the web to kick off a script.
If I have a long process that needs to be kicked off by a browser, I "disconnect" the process from the browser and web server so control is returned to the user while the script runs. PHP scripts run from the command line can run for hours if you want. You can then use AJAX, or reload the page, to check on the progress of the long running script.
There are security concern with this code, but to "disconnect" a process from PHP running under something like Apache:
exec("nohup /usr/bin/php -f /path/to/script.php > /dev/null 2>&1 &");
But that really has nothing to do with PHP being suitable for large projects or being transaction safe. PHP can be used for large projects, but since by default there is no code that remains "resident" between hits, it can get slow if not designed right. Also, since there is no namespace support, you want to plan ahead if you have a large development team.
It's fine for a Java based system to take a few minutes to startup, initialize and load all the default objects. But this is unacceptable with PHP. PHP will take more planning for larger systems. The question is, when does the time saved in using PHP get wasted by the additional planning time required for a large system?
The reason you most likely experienced bad database consistencies in the past is because you were using the MyISAM engine for mysql (which DOES NOT support transactions). Use InnoDB instead, it supports transactions and performs row level locking.
Or use postgreSQL.
Many, many software sites are made in PHP. However, you will not hear about millions of web pages made in PHP that do not exist anymore because they were abandoned. Those pages may have burned all company money for dealing with PHP mess, or maybe they bankrupted because their soft was so crappy that customer did not want it… PHP seems good at the startup, but it does not scale very well. Yes, there are many huge web sites made in PHP, but they are rather exceptions, than a norm.