I have a web app at mmatakedown.net. It's basically a social game, fantasy sports-esque, where you predict fight outcomes and score points and perks for getting them correct, with a leaderboard, etc.
Right now, I have a page I wrote to administer "results". What I'd like to do is use cron job (or a better option?) to check twitter periodically (on the day of a fight card) and look for tweets that note the results of the fights. (from more trusted accounts to start, perhaps). For example, it might know that a card locked today, and that the deadline is passed, so it's time to start searching for results. If two fighters were named Brooks and Jones, for example, it might look for something like Jones def. or Brooks def. to find out who the victor was, then search for something like "sub" or "ko" or "dec" to find out the method of victory.
Once it tallies a certain # of tweets to confirm the result, it would update the DB with the result, then set off a series of updates and notifications based on the result, who picked what, what scores were updated, any milestones reached, etc.
What would be the best approach for this? I write the site w/ PHP/mysql/ajax/jquery presently.
I'm not sure about the best approach, but this functionality ought to be independent of your web application (i.e. the PHP/ajax/jquery code base).
It's a program that can run via cron, check the database for things that need to be updated, and then update the database appropriately.
You could run a cron job every hour or whatever you think is necessary in order to keep the database relatively up to date. Or you could get fancier I suppose.
I think cron is entirely appropriate though.
I have written Python programs that were invoked via cron, did some Twitter API calls, and updated a database. The application was quite different, but Python worked well for the task. It has built-in libraries for everything you'll need (e.g. JSON, http) except the MySQL access, but that's a quick package install.
One complication, you may start searching for tweets and find a few indicating the fight was over, but not enough to change your database. You will probably want a way to ensure that a subsequent invocation of the program has access to this information.
I think generally that when you write these offline batch updating scripts that you should have a table in your database for them to log to. For example, when did they start, how many tweets did they inspect, when did they exit, did they exit successfully, etc.
Related
I am putting together an interface for our employees to upload a list of products for which they need industry stat's (currently doing 'em manually one at a time).
Each product will then be served up to our stat's engine via a webservice api.
I will be replying. The Stat's-engine will be requesting the "next victim" from my api.
Each list the users upload will have between 50 and 1000 products, and will be its own queue.
For now, Queues/Lists will likely be added (& removed via completion) aprox 10-20 times per day.
If successful, traffic will probably rev up after a few months to something like 700-900 lists per day.
We're just planning to go with a simple round-robin approach to direct the traffic evenly across queues.
The multiplexer would grab the top item off of List A, then List B, then List C and so on until looping back around to List A again ... keeping in mind that lists/queues can be added/removed at any time.
The issue I'm facing is just conceptualizing the management of this.
I thought about storing each queue as a flat file and managing the rotation via relational DB (MySQL). Thought about doing it the reverse. Thought about going either completely flat-file or completely relational DB ... bottom line, I'm flexible.
Regardless, my brain is just vapor locking when I try to statelessly meld a variable list of participants with a circular rotation (I just got back from a quick holiday, and I don't think my brain's made it home yet ;)
Has anyone done something like this?
How did you handle it?
What would you improve if you had to do it again?
Any & all tips/suggestions/advice are welcome.
NOTE: Since each request from our stat's engine/tool will be separated by many seconds, if not a couple minutes, I need to keep this stateless.
List data should be stored in a database, for sure. Your PHP side should have a view giving the status of the system, and the form to add lists.
Since each request becomes its own queue, and all the request-queues are considered equal in priority, the ideal number of tables is probably three. One to list requests and their priority relative to another (to determine who goes next in the round-robin) and processing status, another to list the contents (list-items) of each request that are yet to be processed, and a third table to list the processed items from each queue.
You will also need a script that does the actual processing, that is not driven by a user request, but instead by a system-scheduled job that executes periodically (throttled to whatever you desire). This can of course also be in PHP. This is where you would set up your 10-at-a-time list checks and updates.
The processing would be something like:
Select the next set of at most 10 items from the highest-priority queue.
Process them, Updating their DB status as they complete.
Update the priority of the above queue so that it is now the lowest priority.
And if new queues are added, they would be added with lowest priority.
Priority could be represented with an integer.
Your users would need to wait patiently for their list to be processed and then view or download the result. You might setup an auto-refresh script for this on your view page.
It sounds like you're trying to implement something that Gearman already does very well. For each upload / request, you can simply send off a job to the Gearman server to be queued.
Gearman can be configured to be persistent (just in case things go to hell), which should eliminate the need for you logging requests in a relational database.
Then, you can start as many workers as you'd like. I know you suggest running all jobs serially, which you can still do, but you can also parallelize the work, so that your user isn't sitting around quite as long as they would've been if all jobs had been processed in a serial fashion.
After a good nights sleep, I now have my wits about me (I hope :).
A simple solution is a flat file for the priorities.
Have a text file simply with one List/Queue ID on each line.
Feed from one end of the list, and add to the other ... simple.
Criticisms are welcome ;o)
Thanks #Trylobot and #Chris_Henry for the feedback.
I've search on the web and apparently there is no way to launch a php script without user interaction.
Few advisors recommend me Cron but I am not sure this is the right way to go.
I am building a website where auctions are possible just like ebay. And after an amount of time the objects are not available anymore and the auction is considered as finished.
I would like to know a way to interact with the database automatically.
When do you need to know if an object is available? -> Only if someone asks.
And then you have the user interaction you are searching for.
It's something different if you want to, let's say, send an email to the winner of an auction. In this case you'd need some timer set to the ending time of the auction. The easiest way to do this would be a cron job...
There are several ways to do this. Cron is a valid one of them and the one I would recommend if its available.
Another is to check before handling each request related to an object whether it is still valid. If it is not, you can delete it from the database on-the-fly (or do whatever you need to) and display a different page.
Also you could store the time at which your time-based script was run last in the database and compare that time with the current time. If the delay is large enough, you can run your time based code. However, this is prone to race conditions if multiple users hit the page at the same time, so the script may run multiple times (maybe this can be avoided using locks or anything though).
To edit cronjobs from the shell: crontab -e
A job to run every 10 minutes: */10 * * * * curl "http://example.com/finished.php"
TheGeekStuff.com cron Examples
Use heartbeat/bot implement
ation
Cron job that runs pretty frequently or a program that starts on boot and runs continuously (maybe sleeping periodically) is the way to go. With a cron job you'll need to make sure that you don't have two running at any given time or write it such that it doesn't matter if you have more than one working at any given time. With "resident" program you'll need to figure out how to handle the case when it crashes unexpectedly.
I wouldn't rely on this mechanism to actually close the auction, though. That should be handled in your database/web site. That is, the auction has a close time and either the database constraints or your code makes it impossible to bid on a closed auction. Notifying the winner and seller, setting up the payment process, etc. are things your service/scheduled task could do.
What I think I want:
A language that would run persistently on a server to run recurring events that would be able to respond to html requests and give them tailored data. The scheduled events would be dynamically timed and occur frequently. The events would almost always be changes to database information, and the results would be viewed through a web application.
Why I think PHP isn't going to work for me:
Since I can only get my PHP to run when a user requests a page, or through a fixed rigid schedule, I'm having to look at the request time versus the last time it ran and fill in the gaps that should have happened. For example, if I had something that needed to occur every 15 minutes and no PHP scripts ran for several hours; then it would have to check how many were missed a run catch up to simulate what I believe should have happened in that time.
Real world examples and specifics:
I'm going to point to a really stupid example in this section because it's actually really pretty close in real-world concept to what I'm trying to explain. The online game neopets has shops that restock every so often with randomly generated goods. It restocks more if the store is emptier during every ~15 minute restock but the old items remain.
This isn't what I want exactly, but I recognized that at it's basic concept was extremely similar. I'll be extrapolating some odd additions to this system as I think it makes more sense than what I'm actually doing.
What should I do to recreate this shop but with the following requirements:
Restock times are dynamically derived from other events occurring (set time since another item sells perhaps), therefore the schedule isn't a rigid "every quarter hour" that I can run through scheduled PHP scripts. Instead it could be exactly at 5:46:23PM
The "restocking" is derived from other time specific information. I am unable to just simulate all restocks that should have occurred once a new customer checks the page.
What direction should I go?
I have no idea what I'm talking about so assume the following is shit:
I suspect I am wanting a persistent running program on the server that performs these tasks and puts the results into a database which is then accessed and delivered through php to the end user. I'm really just getting started though and I know I'm very likely making a very stupid assumption. Is there something that is both the persistent program and PHP together that I should learn?
I'm doing this as a hobby and just want to learn more so I can make more fun toys for myself.
I think you are mixing up two concerns:
A language that would run persistently on a server to run recurring events that would be able to respond to html requests and give them tailored data
You are not looking for a language to run persistently, but for a system, e.g. a JobQueue (something like Gearman). This JobQueue will do your main scheduled works. You can use PHP to get messages from that queue, trigger new jobs and inspect running jobs, etc. in addition to building the website people will interact with in PHP.
You could always use CRON and call a PHP file to do the dirty work, for the logic side.
(And by that I mean, use the PHP "system" function Documented Here to do the rescheduling commands in CRON itself)
I am trying to create a PBBG (persistent browser based game) like that of OGame, Space4k, and others.
My problem is with the always-updating resource collection and with the building times, as in a time is set when the building, ship, research, and etc completes building and updates the user's profile even if the user is offline. What and/or where should I learn to make this? Should it be a constantly running script in the background
Note that I wish to only use PHP, HTML, CSS, JavaScript, and Mysql but will learn something new if needed.
Cron jobs or the Windows equivalent seem to be the way, but it doesn't seem right or best to me.c
Do you have to query your db for many users properties, like "show me all users who already have a ship of the galaxy class"?
If you do not need this you could just check the build queue if someone requests the profile.
If this is not an option you could add an "finished_at" column to you database and include "WHERE finished_at>= SYSDATE()" in your query. In that case all resources (future and present) are in the same table.
Always keep in mind: what use is there to having "live" data if no one is requesting it?
My problem is with the always-updating
resource collection and with the
building times, as in a time is set
when the building, ship, research, and
etc completes building and updates the
user's profile even if the user is
offline
I think the best way to do this is to install message queue(But you need to be have install/compile it) like beanstalkd to do offline processing. Let's say it takes 30 seconds to build a ship. With pheanstalk client(I like pheanstalk) you first put message to the queue using:
$pheanstalk->put($data, $pri, $delay, $ttr);
You could see protocol for meaning of all arguments.
But with $delay=30. When a worker process does a reserve() it can process the message after 30 seconds.
$job = $pheanstalk->reserve();
Streaming data to user in real-time
Also you could look into XMPP over BOSH to stream the new data to all users in real-time.
http://www.ibm.com/developerworks/xml/tutorials/x-realtimeXMPPtut/index.html
http://abhinavsingh.com/blog/2010/08/php-code-setup-and-demo-of-jaxl-boshchat-application/
I have a PHP script that grabs data from an external service and saves data to my database. I need this script to run once every minute for every user in the system (of which I expect to be thousands). My question is, what's the most efficient way to run this per user, per minute? At first I thought I would have a function that grabs all the user Ids from my database, iterate over the ids and perform the task for each one, but I think that as the number of users grow, this will take longer, and no longer fall within 1 minute intervals. Perhaps I should queue the user Ids, and perform the task individually for each one? In which case, I'm actually unsure of how to proceed.
Thanks in advance for any advice.
Edit
To answer Oddthinking's question:
I would like to start the processes for each user at the same time. When the process for each user completes, I want to wait 1 minute, then begin the process again. So I suppose each process for each user should be asynchronous - the process for user 1 shouldn't care about the process for user 2.
To answer sims' question:
I have no control over the external service, and the users of the external service are not the same as the users in my database. I'm afraid I don't know any other scripting languages, so I need to use PHP to do this.
Am I summarising correctly?
You want to do thousands of tasks per minute, but you are not sure if you can finish them all in time?
You need to decide what do when you start running over your schedule.
Do you keep going until you finish, and then immediately start over?
Do you keep going until you finish, then wait one minute, and then start over?
Do you abort the process, wherever it got to, and then start over?
Do you slow down the frequency (e.g. from now on, just every 2 minutes)?
Do you have two processes running at the same time, and hope that the next run will be faster (this might work if you are clearing up a backlog the first time, so the second run will run quickly.)
The answers to these questions depend on the application. Cron might not be the right tool for you depending on the answer. You might be better having a process permanently running and scheduling itself.
So, let me get this straight: You are querying an external service (what? SOAP? MYSQL?) every minute for every user in the database and storing the results in the same database. Is that correct?
It seems like a design problem.
If the users on the external service are the same as the users in your database, perhaps the two should be more closely configured. I don't know if PHP is the way to go for syncing this data. If you give more detail, we could think about another solution. If you are in control of the external service, you may want to have that service dump it's data or even write directly to the database. Some other syncing mechanism might be better.
EDIT
It seems that you are making an application that stores data for a user that can then be viewed chronologically. Otherwise you may as well just fetch the data when the user requests it.
Fetch all the user IDs in go.
Iterate over them one by one (assuming that the data being fetched is unique to each user) and (you'll have to be creative here as PHP threads do not exist AFAIK) call a process for each request as you want them all to be executed at the same time and not delayed if one user does not return data.
Said process should insert the data returned into the db as soon as it is returned.
As for cron being right for the job: As long as you have a powerful enough server that can handle thousands of the above cron jobs running simultaneously, you should be fine.
You could get creative with several PHP scripts. I'm not sure, but if every CLI call to PHP starts a new PHP process, then you could do it like that.
foreach ($users as $user)
{
shell_exec("php fetchdata.php $user");
}
This is all very heavy and you should not expect to get it done snappy with PHP. Do some tests. Don't take my word for it.
Databases are made to process BULKS of records at once. If you're processing them one-by-one, you're looking for trouble. You need to find a way to batch up your "every minute" task, so that by executing a SINGLE (complicated) query, all of the affected users' info is retrieved; then, you would do the PHP processing on the result; then, in another single query, you'd PUSH the results back into the DB.
Based on your big-picture description it sounds like you have a dead-end design. If you are able to get it working right now, it'll most likely be very fragile and it won't scale at all.
I'm guessing that if you have no control over the external service, then that external service might not be happy about getting hammered by your script like this. Have you approached them with your general plan?
Do you really need to do all users every time? Is there any sort of timestamp you can use to be more selective about which users need "updates"? Perhaps if you could describe the goal a little better we might be able to give more specific advice.
Given your clarification of wanting to run the processing of users simultaneously...
The simplest solution that jumps to mind is to have one thread per user. On Windows, threads are significantly cheaper than processes.
However, whether you use threads or processes, having thousands running at the same time is almost certainly unworkable.
Instead, have a pool of threads. The size of the pool is determined by how many threads your machine can comfortable handle at a time. I would expect numbers like 30-150 to be about as far as you might want to go, but it depends very much on the hardware's capacity, and I might be out by another order of magnitude.
Each thread would grab the next user due to be processed from a shared queue, process it, and put it back at the end of the queue, perhaps with a date before which it shouldn't be processed.
(Depending on the amount and type of processing, this might be done on a separate box to the database, to ensure the database isn't overloaded by non-database-related processing.)
This solution ensures that you are always processing as many users as you can, without overloading the machine. As the number of users increases, they are processed less frequently, but always as quickly as the hardware will allow.