Is it possible to have one PHP file send a "message" to specific users on another PHP file?
Note that the "message" needs to be able to received by a PHP file that is already running, so simply calling the other files doesn't make a difference. The image below demonstrates what I want to do:
In this example, user 1 calls "send.php", which subsequently sends a message to the "receive.php" instances of users 1,2, and 4. Is this possible to accomplish?
Additional Information
I cannot log the messages in a central location like a file or database because I would end up querying for messages about once every 100 ms, which would probably overload the database/filesystem. I need it to be instantaneous.
Additionally, I cannot use sessions or cookies because as mentioned the message needs to be sent to several users. Finally, the receiving PHP file doesn't terminate until the user leaves the page (it's really an HTML5 eventsource file).
You can persist data across pages by using sessions or a database.
Additional Information: I cannot log the messages in a central location
like a file or database because I would end up querying for messages
about once every 100 ms, which would probably overload the
database/filesystem. I need it to be instantaneous.
This can be done, but it's not in best practices. You could go about creating a separate file for each of the messages. This is very bad practice.
In terms of your problem with a centralized database, you never know. This may be able to cope with your requests, just try it and find out.
No programmers have got everything right on their first attempt. They have tried something new, and made a mistake... Everyone makes mistakes in terms of finding the best approach to an application.
Okay, there may be ways that are a lot more lightweight, but try to use a message broker with pub-sub capabilities (called topics in JMS world). They're made especially to transfer high volumes of messages from servers to clients quickly and reliably.
An easy way to create a test system for such a setup could be to use a STOMP capable message broker (see here for a list of available solutions) and combine that with a PHP STOMP client (list available on the same page.
Be sure to read this remark concerning durable subscriptions: as your PHP scripts will most likely not be running at the exact time a message is sent, you'll need the feature that emulates a durable subscription. I don't know all the brokers described on the page mentioned above, but ActiveMQ does, and it has been around for quite some time, and has a very good reputation.
PHP doesn't work that way. It runs the script and terminates. You need to look up session, or use a database. If you want to get fancy, use jQuery and Ajax.
Related
I am a programmer at an internet marketing company that primaraly makes tools. These tools have certian requirements:
They run in a browser and must work in all of them.
The user either uploads something (.csv) to process or they provide a URL and API calls are made to retrieve information about it.
They are moving around THOUSANDS of lines of data (think large databases). These tools literally run for hours, usually over night.
The user must be able to watch live as their information is processed and is presented to them.
Currently we are writing in PHP, MySQL and Ajax.
My question is how do I process LARGE quantities of data and provide a user experience as the tool is running. Currently I use a custom queue system that sends ajax calls and inserts rows into tables or data into divs.
This method is a huge pain in the ass and couldnt possibly be the correct method. Should I be using a templating system or is there a better way to refresh chunks of the page with A LOT of data. And I really mean a lot of data because we come close to maxing out PHP memory and is something we are always on the look for.
Also I would love to make it so these tools could run on the server by themselves. I mean upload a .csv and close the browser window and then have an email sent to the user when the tool is done.
Does anyone have any methods (programming standards) for me that are better than using .ajax calls? Thank you.
I wanted to update with some notes incase anyone has the same question. I am looking into the following to see which is the best solution:
SlickGrid / DataTables
GearMan
Web Socket
Ratchet
Node.js
These are in no particular order and the one I choose will be based on what works for my issue and what can be used by the rest of my department. I will update when I pick the golden framework.
First of all, you cannot handle big data via Ajax. To make users able to watch the processes live you can do this using web sockets. As you are experienced in PHP, I can suggest you Ratchet which is quite new.
On the other hand, to make calculations and store big data I would use NoSQL instead of MySQL
Since you're kind of pinched for time already, migrating to Node.js may not be time sensitive. It'll also help with the question of notifying users of when the results are ready as it can do browser notification push without polling. As it makes use of Javascript you might find some of your client-side code is reusable.
I think you can run what you need in the background with some kind of Queue manager. I use something similar with CakePHP and it lets me run time intensive processes in the background asynchronously, so the browser does not need to be open.
Another plus side for this is that it's scalable, as it's easy to increase the number of queue workers running.
Basically with PHP, you just need a cron job that runs every once in a while that starts a worker that checks a Queue database for pending tasks. If none are found it keeps running in a loop until one shows up.
I have a WordPress plugin, which checks for an updated version of itself every hour with my website. On my website, I have a script running which listens for such update requests and responds with data.
What I want to implement is some basic analytics for this script, which can give me information like no of requests per day, no of unique requests per day/week/month etc.
What is the best way to go about this?
Use some existing analytics script which can do the job for me
Log this information in a file on the server and process that file on my computer to get the information out
Log this information in a database on the server and use queries to fetch the information
Also there will be about 4000 to 5000 requests every hour, so whatever approach I take should not be too heavy on the server.
I know this is a very open ended question, but I couldn't find anything useful that can get me started in a particular direction.
Wow. I'm surprised this doesn't have any answers yet. Anyways, here goes:
1. Using an existing script / framework
Obviously, Google analytics won't work for you since it is javascript based. I'm sure there exists PHP analytical frameworks out there. Whether you use them or not is really a matter of your personal choice. Do these existing frameworks record everything you need? If not, do they lend themselves to be easily modified? You could use a good existing framework and choose not to reinvent the wheel. Personally, I would write my own just for the learning experience.
I don't know any such frameworks off the top of my head because I've never needed one. I could do a Google search and paste the first few results here, but then so could you.
2. Log in a file or MySQL
There is absolutely NO GOOD REASON to log to a file. You'd first log it to a file. Then write a script to parse this file.Tomorrow you decide you want to capture some additional information. You now need to modify your parsing script. This will get messy. What I'm getting at is - you do not need to use a file as an intermediate store before the database. 4-5k write requests an hour (I don't think there will be a lot of read requests apart from when you query the DB) is a breeze for MySQL. Furthermore, since this DB won't be used to serve up data to users, you don't care if it is slightly un-optimized. As I see it, you're the only one who'll be querying the database.
EDIT:
When you talked about using a file, I assumed you meant to use it as a temporary store only until you process the file and transfer the contents to a DB. If you did not mean that, and instead meant to store the information permanently in files - that would be a nightmare. Imagine trying to query for certain information that is scattered across files. Not only would you have to write a script that can parse the files, you'd have to right a non-trivial script that can query them without loading all the contents into memory. That would get nasty very, very fast and tremendously impair your abilities to spot trends in data etc.
Once again - 4-5K might seem like a lot of requests, but a well optimized DB can handle it. Querying a reasonably optimized DB will be magnitudes upon magnitudes of orders faster than parsing and querying numerous files.
I would recommend to use an existing script or framework. It is always a good idea to use a specialized tool in which people invested a lot of time and ideas. Since you are using a php Piwik seems to be one way to go. From the webpage:
Piwik is a downloadable, Free/Libre (GPLv3 licensed) real time web analytics software program. It provides you with detailed reports on your website visitors: the search engines and keywords they used, the language they speak, your popular pages…
Piwik provides a Tracking API and you can track custom Variables. The DB schema seems highly optimized, have a look on their testimonials page.
After searching the web for a good Comet and also and asking you guys what my best option is, I've chose to go with Orbited. The problem is if you need a good documentation about Comet you won't find. I've installed Orbited and It seems It works just fine.
Basically, I want to constantly check a database and see if there is a new data. If there is, I want to push it to my clients and update their home page but I can't find any good and clear doc explaining how constantly check the database and push the new info to Orbited and then to the clients. Have you guys implemented that?
Also, how many users can Orbited handle?
Any ideas?
You could add a database trigger that sends messages to your message queue when the database got changed. This is also suggested here. Or, if it is only your app talking to the database, you could handle this from within the app via a Subject/Observer pattern, notifying the queue whenever someone called an action changing something in the DB.
I don't know how good or bad Orbited scales.
Have a reference table that keeps track of the last updated time of the source table. Create a update/delete/insert trigger for the source table that updates the time in the reference table.
Your comet script should keep checking the reference table for any change in the time. If the change is noticed, you can read the updated source table and push the data to your client's home page. Checking the reference table in a loop is faster because the MySQL will serve the results from its cache if nothing has changed.
And sorry, I don't know much about Orbited.
I would use the STOMP protocol with Orbited to communicate and push data to clients. Just find a good STOMP client with PHP and get started.
Here is an example of a use case with STOMP, although the server side is written in Ruby:
http://fuglyatblogging.wordpress.com/2008/10/
I don't know if PHP with Apache (if that's what you are using) is the best suite for monitoring database changes. Read this article, under the section title "Orbited Server", for an explanation: http://thingsilearned.com/2009/06/09/starting-out-with-comet-orbited-part-1/
EDIT: If you want to go the route with PHP through a web server, you need to make one, and one only, request to a script that starts the monitoring and pushes out changes. And if that script times out or fails, you need to start a new one. A bit fugly :) A nicer, cleaner way would be, for example, to use twisted with python to start a monitoring process, completely separated from the web-server.
First,
the set up:
I have a script that executes several tasks after a user hits the "upload" button that sends the script the data it need. Now, this part is currently mandatory, we don't have the option at this point to cut out the upload and draw from a live source.
This section intentionally long-winded to make a point. Skip ahead if you hate that
Right now the data is parsed from a really funky source using regex, then broken down into an array. It then checks the DB for any data already in the uploaded data's date range. If the data date ranges don't already exist in the DB, it inserts the data and outputs success to the user (there is also some security checks, data source validation, and basic upload validation)... If the data does exist, the script then gets the data already in the DB, finds the differences between the two sets, deletes the old data that doesn't match, adds the new data, and then sends an email to each person affected by these changes (one email per person with all relevant changes in said email, which is a whole other step). The email addresses are pulled by means of an LDAP search as our DB has their work email but the LDAP has their personal email which ensures they get the email before they come in the next day and get caught unaware. Finally, the data-uploader is told "Changes have been made, emails have been sent." which is really all they care about.
Now I may be adding a Google Calendar API that posts the data (when it's scheduling data) to the user's Google Calendar. I would do it via their work calendar, but I thought I'd get my toes wet with Google's API before dealing with setting up a WebDav system for Exchange.
</backstory>
Now!
The practical question
At this point, pre-Google integration, the script takes at most a second and a half to run. It's pretty impressive, at least I think so (the server, not my coding). But the Google bit, in tests, is SLOOOOW. We can probably fix that, but it raises the bigger question...
What is the best way to off-load some of the work after the user has gotten confirmation that the DB has been updated? This is the part he's most concerned with and the part most critical. Email notifications and Google Calendar updates are only there for the benefit of those affected by the upload, and if there is a problem with these notifications, he'll hear about it (and then I'll hear about it) regardless of the script telling him first.
So is there a way, for example, to run a cronjob that's triggered by a script's last execution? Can PHP create cronjobs with exec() ability? Is there some normalized way of handling post-execution work that needs getting done?
Any advice on this is really appreciated. I feel like the scripts bloated-ness reflects my stage of development and the need for me to finally know how to do division-of-labor in web apps.
But I also get worried that this is not done, as user's need to know when all tasks are completed, etc. So this brings up:
The best-practices/more-subjective question
Basically, is there an idea that progress bars, real-time offloading, and other ways of keeping the user tethered to the script are --when combined with optimization of the code, of course-- the better, more-preferred method then simply saying "We're done with your part, if you need us, we'll be notifying users" etc etc.
Are there any BIG things to avoid (other than obviously not giving the user any feedback at all)?
Thanks for reading. The coding part is crucial, so don't feel obliged to cover the second part or forget to cover the coding part!
A cron job is good for this. If all you want to do when a user uploads data is say "Hey user, thanks for the data!" then this will be fine.
If you prefer a more immediate approach, then you can use exec() to start a background process. In a Linux environment it would look something like this:
exec("php /path/to/your/worker/script.php >/dev/null &");
The & part says "run me in the backgound." The >/dev/null part redirects output to a black hole. As far as handling all errors and notifying appropriate parties--this is all down to the design of your worker script.
For a more flexible cross-platform approach, check out this PHP Manual post
There are a number of ways to go about this. You could exec(), like the above says, but you could potentially run into a DoS situation if there are too many submit clicks. the pcntl extension is arguably better at managing processes like this. Check out this post to see a discussion (there are 3 parts).
You could use Javascript to send a second, ajax post that runs the appropriate worker script afterwards. By using ignore_user_abort() and sending a Content-Length, the browser can disconnect early, but your apache process will continue to run and process your data. Upside is no forkbomb potential, Downside is it will open more apache processes.
Yet another option is to use a cron in the background that looks at a process-queue table for things to do 'later' - you stick items into this table on the front end, remove them on the backend while processing (see Zend_Queue).
Yet another is to use a more distributed job framework like gearmand - which can process items on other machines.
It all depends on your overall capabilities and requirements.
For my current web development project I'm implementing a back end system that will flag errors and send an email to the administrator automatically with details about what occurred. Trapping the error and generating the email with appropriate error information is pretty straight forward; but a problem arises when one considers certain groups of error types especially if the site is being visited frequently.
Consider a couple of examples:
An unplanned database outage that prevents all of the scripts on the web server from being able to connect. If it takes say 2 minutes (120 seconds) for the database server to come back online, and the web server is receiving unique requests at a rate of 10/second, in the time it takes the database server to come back online the admins email would be flooded with 1200 identical emails all screaming about a failure to connect to the database.
A bug in a script somewhere managed to sneak by testing and is of the variety that completely screws up content generation and occurs only in a specific set of circumstances (say once every 100 requests). Using the unique request rate of 10/second again means the administrator is going to be getting the same email every 10 seconds about the same bug until it is fixed.
What are some approaches/strategies I can use to prevent this scenario from occurring? (I am only interested in monitoring of errors generated by the script, infrastructure issues are beyond the scope of this solution)
I going to assume that I can almost always uniquely identify errors using a digest of some of the values passed to the error handler callback set by set_error_handler.
The first and probably most obvious solution is recording in a database and only send the email if a reasonable minimum period of time has passed since it last occurred. This isn't the ideal approach especially if the database is causing the problem. Another solution would be to write files to disk when errors occur and check if a reasonable minimum period of time has passed since the file was last modified. Is there any mechanism to solve this problem beyond the two methods I have described?
Why not simply allow them all to be sent out and then collect and store them in a database on the recipient end. That way you bypass the possibility of the database being the problem in the server.
Also, a greater advantage in my opinion, is that you don't arbitrarily throw out valuable forensic data. Post hoc analysis is very important and any kind of filtering could make it incredibly difficult, or impossible.
Have you tried looking into monitoring software like SiteScope?
What i did was monitoring the error log, and sending a digest every 5 minutes. I'd like to think it's because of my high quality code (versus an unpopular app!), but i don't get hassled too much :P I basically read the log file from end to start, parse error messages, and stop when the timestamp < the last time i ran the job, then send a simple email.
This works well enough. However, if you use POST alot, there is a limited amount of information you could get from correlating the apache access log with your php error log. I remember reading about a module to log POSTs to a file from within apache, but don't remember the specifics.
However, if you're willing to use the error handler to write somewhere, that might be best as you've got access to much more information. ip, session id (and any user information, which might impact settings, like pagination or whatever), function arguments (debug_backtrace, or whatever it is) ... Write every error, just send messages when new errors occur, or after an error has been acknowledged (if you care to write such a system).
You should go ahead and generate whatever log files you want. But instead of sending the emails yourself, hook the logs up to a monitoring system like Nagios. Let the monitoring solution decide when to alert the admins, and how often.