I'm trying to set up a webserice for mobile that allows user to send and receive data from it.
Everything is PHP/MYSQL.
From time to time on that server, I've set up Jenkins to run a few php scripts that calculate various stuff that are quite intensive and may take up to 20 minutes to finish (They have to connnect to another website and check with an api).
Is there a way to limit the memory and cpu consumption (maybe even hdd - due to mysql) of a php script so the users don't experience slowdowns?
Or perhaps the system shouldn't be set up like this but rather a master-master mysql database and the scripts run on the second computer that isn't accessed by the users? (Also for this case wouldn't the second computer be slowed down thus the master-master connection would suffer?).
Is there any other way to set this?
A few other things to take into consideration:
- The data the script needs is sent by users. (It's read once at the beginning of the script, data that comes after is used next time).
- The script runs every hour or every 4 hours or once a day (multiple scripts).
Related
I have about 35 cron jobs right now. Most of them are PHP scripts that either scrape or do some calculations. The scripts also loop over 10-20 different servers to do those scrapes. (They are different countries so they have to be separate calls).
So we have 30 scripts, each has a loop over 20 servers and therefore take about 5-15 minutes to run per script. I have each script spaced out right now.
But is it better to have 80 individual scripts run instead of 35 scripts that loop and take a while? Each script would take maybe 1-2 minutes instead of 10-15min.
That would of course spawn a ton more PHP processes. Is there any issue or limit with 10-15 or more PHP processes running at once?
I'm running a cloud server performance on Rackspace.
Personally if the jobs need to complete in a certain order I would make it as linear as possible.....it might take longer but I always err . The side of data accuracy.
It depends.
If you are creating more processes that will be running at the same time you are going to increase your overall memory footprint. Each process will carry it's own overhead of memory for the process to run, and to load any libraries needed for it's process. (aside from whatever it needs to do whatever it does). You will also more than have twice as many script to monitor that they are successfully running all the time.
However in creating more processes you will be able to speed things us since you are essentially creating a multi-thread. Allowing one process to continue while another is blocking waiting for i/o.
If each script doesn't have a dependency on another, breaking them into smaller scripts should be fine. If you can handle monitoring more scripts, and the server can handle it, then I would do it.
If scripts do have dependencies, or if you would have to run so many at the same time you server usage maxes out, keep them together.
That being said, I would also try to optimize the script, make sure there isn't something you can do to make them faster without create more processes.
Depending on how you have the servers setup, I would run them at once. In addition, I would also run them at night, off hours when the web servers aren't in use and not during business operations unless your web app depends on it. If you're on a Cloud server on Rackspace I wouldn't worry about bandwidth although increasing your ram could be an issue further down the road.
Spawning a ton more PHP process shouldn't be a worry if you have sufficient amount of ram; there is no limitation on the linux side.
a) Figure out which cron needs to run in which order
b) Order the cron to be run at night, around mid-night
c) Run and fireoff the 80 scripts at once
it would also be a good idea to send you an email with cron results or report that it all went through successfully, based on the batch but not individual cron.
I have a situation where I need rapid and very frequent updates from a website's API. (I've asked them about how fast I can hammer them and they've said as fast as you like.)
So my design architecture is to create several small fast running PHP scripts that do a very specific action, save the result to memcache, and repeat. So the first script grabs a single piece of data via their API and stores it in memcache and then asks again. A second script processes the data the first script stored in memcache and requests another piece of data from the API based on the results of that processing. The third uses the result from the second, does something with that data, asks for more data via the API, on up the chain until a decision is made to execute via their API.
I am running these scripts in parallel on a machine with 24 GB RAM and 8 cores. I am also using supervisor in order to manage them.
When I run every PHP script manually via CLI or browser they work fine. They don't die except where I've told them to for the browser so I can get some feedback. The logic is fine, the scripts runs fine, etc, etc.
However, when I leave them running infinitely vai supervisor the logs fill up with Maximum execution time reached errors and the line it points to is line in one of my classes that gets the data from memcache. Sometimes it bombs on a check to see if the data is JSON (which it should always be), sometimes it bombs elsewhere in the same function/method. The timeout is set for the supervisor managed script is 5 sec because the data is stale by then.
I have considered upping the execution time but
the data will be stale by then,
memcache typically returns in less than 1 msec so 5 sec is an eternity,
none of the scripts have ever failed due to timeout when manually (CLI or browser) run
Environment:
Ubuntu 12.04 Server
PHP 5.3.10-1unbuntu3.9 with Suhosin-Patch
Memcached 1.4.13
Supervisor ??
Memcache Stats (from phpMemcachdAdmin):
Size: 1 GB
Uptime: 17 hrs, 38 min
Hit Rate: 76.5%
Used: 18.9 MB
Wasted: 18.9 MB
Bytes Written: 307.8 GB
Bytes Read: 7.2 GB
Here's a screenshot:
--------------- Additional Thoughts/Questions ----------------
I don't think it was clear in my original post that in order to get rapid updates I am running multiple copies in parallel of the scripts that grab API data. So if one script is grabbing basic account data looking for a change to trigger another event, then I actually have at least 2 instances running concurrently. This is because my biggest risk factor is stale data causing a delayed decision combined with a 1+ sec response time from the API.
So it occurred to me that the issue may stem from write conflicts where 2 instances of the same script are attempting to write to the same cache key. My initial Googling didn't lead to any good material on possible write conflicts/collisions in memcache. However, a little deeper dive provided a page where a user with 2 bookmarking sites powered by Elgg off of 1 memcache instance ran into what he described as collisions.
My initial assumption when deciding to kick multiple instances off in parallel was that Supervisor would kick them off in a sequential and therefore slightly staggered manner (maybe a bad assumption, I'm new to using Supervisor). Additionally, the API would respond at different rates to each call. Thus with a write time in the sub-millisecond time frame and an update from each once every 1-2 seconds the chances of write conflicts/collisions seemed pretty low.
I'm considering using some form of prefix/postfix with the keys. Each instance already has it's own instance ID created from an md5 hash. So I could prefix or postfix and then have each instance write to it's own key. But then I need another key that holds all of those prefixed/postfixed keys. So now I'm doing multiple cache fetches, a loop through all the stored data, and a discard of all but one of those results. I bet there's a better/faster architecture out there...
I am adding the code to do the timing Aziz asked for now. It will take some time to add the code and gather the data.
Recommendations welcome
I have a server that has 2 quad core processors (2.4 GHz, 16GB RAM). I have a some PHP scripts that run under very heavy load. Most of these scripts do few things:
Fetch Data from database (just a single row, from a small table)
Fetch Data from other server (mainly Facebook)
Upload a small photo
Update Database table (this table is very heavily used, and number of rows grows very quickly, almost 2 rows per second)
The problem is that, the scripts are taking too much time to execute. I had a server previously which has lower configuration (one quad core processor, 6GB RAM), but scripts took 4-5 sec to complete. But now, execution time is 30-40sec, even more.
HOW I MEASURE EXECUTION TIME? I measure microtime() at start of script and end of script and subtract them. I just needed a rough estimate.
SERVER CONFIGURATION: Here are some parameters set in apache config:
server_limit = 350
max_chlid = 350
keep_alive = off
Other Characteristics:
1. When server is not under heavy load, execution time is very small
2. Previous server took very less time to execute, even under heavy load
I don't know what else details should I include. Please ask me, and I will post them here.
What should I do to improve this?
Update:
I have figured out the problem is with ImageMagick library. I googled and tried few soution like disabling OpenMP. But it hasn't helped much
I'm suggesting to do profiling with xdebug and then analyze it with software like kcachegrind. Then you will know what's taking time.
This could have many reasons:
Are your queries "slow"?
Is the server configuration right?
Has it a slow bandwidth?
Is MySql-Server configuration right?
What is the format of the table you insert?
Is something else (a cronjob e.g.) killing the database?
I would post this as a comment, but unfortunatly i can't please clear up those questions and tell what you find out ;)
I would start to decouple the problem. Test each action (fetch from db, fetch from fb, upload, etc.) separately.
At the same time check if all the components of your new server env are the same (packages, version, config, etc.) as before.
I have a daemon that does the following
retrieves site members from a mysql database (I used LIMIT 1000 to retrieve 1000 rows at a time)
send information about these members to a third party server
flag each member as having been processed
Sleep for 2 seconds
Retrieve the next batch of 1000 "unprocessed" members and send to third party server.
and so on.
I am wondering whether a php daemon (I am using the system Daemon library), is the best way to accomplish this task delineated above.
I am worried of wasting too much memory (as PHP is known for that)
I am also worried about sending multiple requests to third party server, because on a high traffic day, there can be a lot of nonreceipts.
Is there a tool other than daemon I can use to accomplish this task? What methods can I implement to make this efficient considering there is a possibility of having to process over 100K rows in the mysql table, and the task is time sensitive. Also, at what point should I consider adding more servers?
Thanks!
A cron should be a very good option for doing a sync job with a third party server.
Consider the following 'improvments':
1) A lock file to prevent multiple jobs from starting in parallel and taking extra resources from other processes you have running. And also to avoid duplicate processing of data.
2) If you don't have already implement an 'information update' and 'sync time' check on your side. For example if user A hasn't suffered any changes since he was sync you don't sync him again.
3) Consider how often you need data to be sync and if it doesn't have to be real time factor that into the selection query. Combined with user/time distribution and other factors you migth end up having periods of time when your script doesn't sync that many accounts.
4) Do your own memory cleanup unsetting variables, unlinking files and even reusing the same variables so you don't have garbage variables that are a 1 time use only inside the scripts. Carefull with this as it might lead to obfuscating the code.
Also consider using smaller datasets when you send them to php for processing. Databases love big datasets, php doesn't.
I would suggest you using Perl, as it is more memory and performance efficient and it has more features for integrating with system and running as daemon.
And now about when it's time for adding more servers. I am assuming that third party server has enough resources for processing many records. So if you are running out of resources on your side I would suggest using MySQL replication to replicate your DBs to other server(s) and running above mentioned daemon there.
I am trying to write a client-server app.
Basically, there is a Master program that needs to maintain a MySQL database that keeps track of the processing done on the server-side,
and a Slave program that queries the database to see what to do for keeping in sync with the Master. There can be many slaves at the same time.
All the programs must be able to run from anywhere in the world.
For now, I have tried setting up a MySQL database on a shared hosting server as where the DB is hosted
and made C++ programs for the master and slave that use CURL library to make request to a php file (ex.: www.myserver.com/check.php) located on my hosting server.
The master program calls the URL every second and some PHP code is executed to keep the database up to date. I did a test with a single slave program that calls the URL every second also and execute PHP code that queries the database.
With that setup however, my web hoster suspended my account and told me that I was 'using too much CPU resources' and I that would need to use a dedicated server (200$ per month rather than 10$) from their analysis of the CPU resources that were needed. And that was with one Master and only one Slave, so no more than 5-6 MySql queries per second. What would it be with 10 slaves then..?
Am I missing something?
Would there be a better setup than what I was planning to use in order to achieve the syncing mechanism that I need between two and more far apart programs?
I would use Google App Engine for storing the data. You can read about free quotas and pricing here.
I think the syncing approach you are taking is probably fine.
The more significant question you need to ask yourself is, what is the maximum acceptable time between sync's that is acceptable? If you truly need to have virtually realtime syncing happening between two databases on opposite sites of the world, then you will be using significant bandwidth and you will unfortunately have to pay for it, as your host pointed out.
Figure out what is acceptable to you in terms of time. Is it okay for the databases to only sync once a minute? Once every 5 minutes?
Also, when running sync's like this in rapid succession, it is important to make sure you are not overlapping your syncs: Before a sync happens, test to see if a sync is already in process and has not finished yet. If a sync is still happening, then don't start another. If there is not a sync happening, then do one. This will prevent a lot of unnecessary overhead and sync's happening on top of eachother.
Are you using a shared web host? What you are doing sounds like excessive use for a shared (cPanel-type) host - use a VPS instead. You can get an unmanaged VPS with 512M for 10-20USD pcm depending on spec.
Edit: if your bottleneck is CPU rather than bandwidth, have you tried bundling up updates inside a transaction? Let us say you are getting 10 updates per second, and you decide you are happy with a propagation delay of 2 seconds. Rather than opening a connection and a transaction for 20 statements, bundle them together in a single transaction that executes every two seconds. That would substantially reduce your CPU usage.