Log to filesystem or database? - php

I'm using LAMP and have full access to server configuration and setup.
I am very confused about which is the best method to simply log some data to.
I want to log things like this:
Analytics (server side with PHP) of every visitor.
When creating a new user, store id number so an email and SMS message can be sent later in a Cron Task (saves from sending the email/SMS during the users' request).
Number of page views of certain 'articles'. Increment once per visit to that page.
As you can see they are all simple insert/add actions that all can be processed later by a Cron Task.
The application needs to be scalable for the future.
These are my options (and what I have learned):
(1) Database (MySQL). People say don't use this for logging data like above.
(2) Use file_get_contents() WITHOUT a file lock. I'm told this can cause data corruption.
(3) Use file_get_contents() WITH a file lock but I believe this either results in missed data as file_get_contents returns false and doesn't add the data if the lock is in force -OR- it results in PHP having to wait for the lock to release. I don't think MySQL has to wait to do multiple inserts.
Which is the best option? Does it make a difference if I'm handling tens of requests per seconds compared to thousands of requests per second, or would I use the same option?

Related

Executing a long action

I'm creating a PHP script that will allow a user to log into a website and execute database queries and do other actions that could take some time to complete. If the PHP script runs these actions and they take too long, the browser page times out on the user end and the action never completes on the server end. If I redirect the user to another page and then attempt to run the action in the PHP script, will the server run it even though the user is not on the page? Could the action still time out?
In the event of long-running server-side actions in a web application like this, a good approach is to separate the queueing of the actions (which should be handled by the web application) from the running of the actions (which should be handled by a different server-side application).
In this case it could be as simple as the web application inserting a record into a database table which indicates that User X has requested Action Y to be processed at Time Z. A back-end process (always-running daemon, scheduled script, whatever you prefer) would be constantly polling that database table to look for new entries. ("New" might be denoted by something like an "IsComplete" column in that table.) It could poll every minute, every few minutes, every hour... whatever is a comfortable balance between server performance and the responsiveness of an action beginning when it's requested.
Once the action is complete, the server-side application that ran the action would mark it as complete in the database and would store the results wherever you need them to be stored. (Another database table or set of tables? A file? etc.) The web application can check for these results whenever you need it to (such as on each page load, maybe there could be some sort of "current status" of queued actions on each page so the user can see when it's ready).
The reason for all of this is simply to keep the user-facing web application responsive. Even if you do things like increase timeouts, users' browsers may still give up. Or the users themselves may give up after staring at a blank page and a spinning cursor for too long. The user interface should always respond back to the user quickly.
You could look at using something like ignore_user_abort but that is still not ideal in my opinion. I would look at deferring these actions and running them through a message queue. PHP comes with Gearman - that is one option. Using a message queue scales well and does a better job ensuring the request actions actually get completed.
Lots on SO on the subject... Asynchronous processing or message queues in PHP (CakePHP) ...but don't use Cake :)
set_time_limit() is your friend.
If it were me, I would put a loading icon animation in the user interface telling them to wait. Then I would execute the "long process" using an asynchronous AJAX call that would then return an answer, positive or negative, that you would pass to the user through JavaScript.
Just like when you upload pictures to Facebook, you can tell the user what is going on. Very clean!

PHP - enforce user wait before using server resources

I have a PHP function that I want to make available publically on the web - but it uses a lot of server resources each time it is called.
What I'd like to happen is that a user who calls this function is forced to wait for some time, before the function is called (or, at the least, before they can call it a second time).
I'd greatly prefer this 'wait' to be enforced on the server-side, so that it can't be overridden by dubious clients.
I plan to insist that users log into an online account.
Is there an efficient way I can make the user wait, without using server resources?
Would 'sleep()' be an appropriate way to do this?
Are there any suggested problems with using sleep()?
Is there a better solution to this?
Excuse my ignorance, and thanks!
sleep would be fine if you were using PHP as a command line tool for example. For a website though, your sleep will hold the connection open. Your webserver will only have a finite number of concurrent connections, so this could be used to DOS your site.
A better - but more involved - way would be to use a job queue. Add the task to a queue which is processed by a scheduled script and update the web page using AJAX or a meta-refresh.
sleep() is a bad idea in almost all possible situations. In your case, it's bad because it keeps the connection to the client open, and most webservers have a limit of open connections.
sleep() will not help you at all. The user could just load the page twice at the same time, and the command would be executed twice right after each other.
Instead, you could save a timestamp in your database for when your function was last invoked. Then, before invoking it, you should check the database to see if a suitable amount of time has passed. If it has, invoke the function and update the timestamp in the database.
If you're planning on enforcing a user login, than the problem just got a whole lot simpler.
Have a record inn the database listing users and the last time they used your resource consuming service, and measure the time difference between then and now. If the time difference is too low, deny access and display an error message.
This is best handled at the server level. No reason to even invoke PHP for repeat requests.
Like many sites, I use Nginx and you can use it's rate-limiting to block repeat requests over a certain number. So like, three requests per IP, per hour.

Background PHP worker

I have a script that takes a while to process, it has to take stuff from the DB and transfers data to other servers.
At the moment i have it do it immediately after the form is submitted and it takes the time it takes to transfer that data to say its been sent.
I was wondering is the anyway to make it so it does not do the process in front of the client?
I dont want a cron as it needs to be sent at the same time but just not loading with the client.
A couple of options:
Exec the PHP script that does the DB work from your webpage but do not wait for the output of the exec. Be VERY careful with this, don't blindly accept any input parameters from the user without sanitising them. I only mention this as an option, I would never do it myself.
Have your DB updating script running all the time in the backgroun, polling for something to happen that triggers its update. Say, for example, it could be checking to see if /tmp/run.txt exists and will start DB update if it does. You can then create run.txt from your webpage and return without waiting for a response.
Create your DB update script as a daemon.
Here are some things you can take a look at:
How much data are you transferring, and by transfer is it more like a copy-and-paste the data only, or are you inserting the data from your db into the destination server and then deleting the data from your source?
You can try analyzing your SQL to see if there's any room for optimization.
Then you can check your php code as well to see if there's anything, even the slightest, that might aid in performing the necessary tasks faster.
Where are the source and destination database servers located (in terms of network and geographically, if you happen to know) and how fast the source and destination servers are able to communicate through the net/network?

The best way to access data from a database every X seconds (asynchronously)

Ok, I didn't really now how to formulate this question, and especially not the title. But i'll give it a try and hope i'm being specific enough while trying to keep it relevant to others.
I you want to run a php script in the background (via ajax) every X seconds that returns data from a database, how do you do this the best way without using to much of the server resources?
My solution looks like this:
A user visits a webpage, ever x seconds that page runs a javascript. The javascript calls a PHP script/file that calls the database, retrieves the data and returns the data to the javascript. The javascript then prints the data to the page. My fear is that this way of solving it will put a lot of pressure on the server if there is a lot (10 000) simultaneous visitors on the page. Is there another way to do this?
That sounds like the best way, given the spec/requirement you set out.
Another way is to have an intermediary step. If you are going to have a huge amount of traffic (otherwise this does not introduce any benefit, but to the contrary may overcomplicat/slow the process), add another table that records the last time a dataset was pulled, and a hard file (say, XML) which if the 'last time' was deemed too long ago, is created from a new query, this XML then feeds the result returned to the user.
So:
1.Javascript calls PHP script (AJAX)
2.PHP pings DB table which contains last time data was fully output
3.If time is too great, 'main' query is rerun and XML file is regenerated from output
ELSE skip to 4
4.Fetch the XML file and output as appropriate for returned AJAX
You can do it the other way, contacting the client just when you need it and wasting less resources.
Comet it's the way to go for this option:
Comet is a programming technique that
enables web servers to send data to
the client without having any need for
the client to request it. This
technique will produce more responsive
applications than classic AJAX. In
classic AJAX applications, web browser
(client) cannot be notified in real
time that the server data model has
changed. The user must create a
request (for example by clicking on a
link) or a periodic AJAX request must
happen in order to get new data fro
the server.

Dealing with long server-side operations using ajax?

I've a particularly long operation that is going to get run when a
user presses a button on an interface and I'm wondering what would be the best
way to indicate this back to the client.
The operation is populating a fact table for a number of years worth of data
which will take roughly 20 minutes so I'm not intending the interface to be
synchronous. Even though it is generating large quantities of data server side,
I'd still like everything to remain responsive since the data for the month the
user is currently viewing will be updated fairly quickly which isn't a problem.
I thought about setting a session variable after the operation has completed
and polling for that session variable. Is this a feasible way to do such a
thing? However, I'm particularly concerned
about the user navigating away/closing their browser and then all status
about the long running job is lost.
Would it be better to perhaps insert a record somewhere lodging the processing record when it has started and finished. Then create some other sort of interface so the user (or users) can monitor the jobs that are currently executing/finished/failed?
Has anyone any resources I could look at?
How'd you do it?
The server side portion of code should spawn or communicate with a process that lives outside the web server. Using web page code to run tasks that should be handled by a daemon is just sloppy work.
You can't expect them to hang around for 20 minutes. Even the most cooperative users in the world are bound to go off and do something else, forget, and close the window. Allowing such long connection times screws up any chance of a sensible HTTP timeout and leaves you open to trivial DOS too.
As Spencer suggests, use the first request to start a process which is independent of the http request, pass an id back in the AJAX response, store the id in the session or in a DB against that user, or whatever you want. The user can then do whatever they want and it won't interrupt the task. The id can be used to poll for status. If you save it to a DB, the user can log off, clear their cookies, and when they log back in you will still be able to retrieve the status of the task.
Session are not that realible, I would probably design some sort of tasks list. So I can keep records of tasks per user. With this design I will be able to show "done" tasks, to keep user aware.
Also I will move long operation out of the worker process. This is required because web-servers could be restrated.
And, yes, I will request status every dozens of seconds from server with ajax calls.
You can have JS timer that periodically pings your server to see if any jobs are done. If user goes away and comes back you restart the timer. When job is done you indicate that to the user so they can click on the link and open the report (I would not recommend forcefully load something though it can be done)
From my experience the best way to do this is saving on the server side which reports are running for each users, and their statuses. The client would then poll this status periodically.
Basically, instead of checkStatusOf(int session), have the client ask the server of getRunningJobsFor(int userId) returning all running jobs and statuses.

Categories