"How the sausage is made" tour of apache/php/mysql interaction

"How the sausage is made" tour of apache/php/mysql interaction - php

I am having trouble understanding how apache/php/mysql stack works on a low level (including interaction with the browser). Is there a good description somewhere (a book, a website, etc) that will walk me through the whole path and explaining how starting with a browser reqesting a url, http requests is being sent, how apache talks to php, how php talks to mysql (persistant and non-persistant connections), etc, etc. I want to understand what waits for what in this chain, where timeouts are handled, how long sockets are opened and closed. A book, an article maybe? There is a lot of documentation on each individual component, but I can't find a "walkthrough".
The explanations I se so far are very high-level: look, here's a happy cow, it goes to Bovine University, look - it's all shrink wrapped on the supermarket shelf. What I need is the sausage farm/slaughterhouse/truck/factory tour, starting with cow insemenation :)
[update] To this day I have not found a better way to learn about these things other than reading the source.

PHP and MySQL by example has a pretty basic picture of the process, which I think you probably already understand.
Getting more in-depth than that picture though is a pretty long discussion. Ironically, you can read the book I just linked for a pretty good description. If you have more specific questions, I recommend opening new questions for them. Enjoy!

I have found a site that has, at least in part, contents from the book Advanced PHP Programming by George Schlossnagle.
The site is located at: http://php.find-info.ru/php/016/toc.html. Specifically, the section on The PHP Request Life Cycle contains a lot of the nitty-gritty details, including some source code and diagrams.
DISCLAIMER: IANAL, but considering that the book is still listed on Amazon, its possible the content linked to above breaks all sorts of codes, rules and/or laws. Its not my intention to proliferate or condone illegal or pirated materials, so if that be the case, please remove said links.

You are correct in the fact that there are entire books written on how this all fits togeather here is a link to a "walkthrough" it touches on the main parts.
http://computer.howstuffworks.com/web-server.htm
Hope it helps

The best course of action would be to get a good book about the LAMP stack.
A quick response (ask for more if you feel you need it)
Browser contacts web server though HTTP protocol
Server generates (let's leave how for the moment) an html result and posts it back.
Each browser understands only http protocol (for the sake of this analysis).
Now items such as icons, images, javascript etc, are just read from the apache server and "copied" to the browser. Same in plain html files.
The difference is in php files (I am oversimplifying here). These are passed to the php module and the response (of the module) will be sent back to the browser.
The php module is what understands php.
Are we together here? if yes then:
Php script may (or may not) require data from an MySQL server, it has to connect get them or manipulate them etc.
Summarising: Each of these operation is being done individually in a different process level. That's what makes it "simple".
Ask for more information if you want something more specific.

As far as I understand it apache receives the request, and works out what to do with it based on your .htaccess or config options. It then passes this request to PHP for parsing, if needed. PHP does two scans of the code, the first is the pre-parse, this picks up obvious flaws and runs functions on the top level(ignoring any in if statements, loops, includes, evals or lamda based functions), before parsing the page for real. Anything done with echo, I do believe, is returned as the standard out stream, and is returned to apache. If apache times the page out it sends the kill signal to PHP, which closes objects, prints the error messages if needed, before exiting. Once the page exits apache tends to headers and returns the page.
I would love to know more about this though, so if anyone can explain it better or has a correction/expansion on my answer, I'd love to hear it.

Related

best practice to cache dynamic content at client side

Sorry for that, but i really concluded with the decision that it's better to ask directly than browsing tons of pages in vain.
I've already looked through enough resources, but haven't found a decent explanation that could fulfill my curiosity about simplest question.
Assume there’s a URI located at – hhtp://example.com/example (including php script, queries to the database).
Let’s imagine I’ve loaded it in my browser, click some link and hit “back” to return to hhtp://example.com/example
As far as I can allow myself to understand about what happens behind the scene looks smth like this:
After “back” been clicked there browser checks its cache specifically hhtp://example.com/example which matches exactly to the requested file (after “back”) and finds out that it wasn’t changed within this short period of time since it was first time loaded and returns it from its cache.
Wait!!!!
The file contains server side scripts, database queries and so forth.
So it should again reach web server, request same data from mysql and output it in a file.
So what’s the best strategy to cache dynamic content preferably on client-side vs server-side?
In which cases it makes useful to cache content at server-side, and what practice is the best?
Please can someone provide some resources covering this subject that can be conceived by such dumpers like my and refute or adjust the scheme above about what actually happens.
While browsing the issue i run into one service - http://gtmetrix.com/ I liked very much,
There were smth mentioned about making ajax request cacheable – I may assume that it can be perfectly used for client-side caching of dynamic content retrieved from database. Can someone please acknowledge it or deprecate?

PHP dealing with concurrency

I'm running an enterprise level PHP application. It's a browser game with thousands of users online on an infrastructure that my boss refuses to upgrade and the machinery is running on 2-3 system load (yep linux) at all times. Anyhow that's not the real issue. The real issue is that some users wait until the server gets loaded (prime time) and they bring their mouse clickers and they click the same submit button like 10 - 20 times, sending 10-20 requests at the same time while the server is still producing the initial request, thus not updated the cache and the database.
Currently I have an output variable on each request, which is valid for 2 minutes and I have "mutex" lock which is basically a flag inside memcache which if found blocks the execution of the script further, but the mouse clicker makes so many requests at the same time that they run almost simultaneously which is a big issue for me.
How are you, the majority of StackOverflow folks dealing with this issue. I was thinking of flagging the cookie/session but I think I will get in the same issue if the server gets overloaded. Optimization is impossible, the source is 7 years old and is quite optimized, with no queries on most pages (running off of cache) and only querying the database on certain user input, like the one I'm trying to prevent.
Yep it's procedural code with no real objects. Machines run PHP 5 but the code itself is more of a PHP 4. I know, I know it's old and stuff but we can't spare the resource of rewriting this whole mess since most of the original developers left that know how stuff is intertwined and yeah, I'm basically patching old holes. But as far as I know this is a general issue on loaded PHP websites.
P.S: Disabling the button with javascript on submit is not an option. The real cheaters are advanced users. One of them had written a bot clicker and packed it as a Google Chrome extension. Don't ask how I dealt with that.

I would look for a solution outside your code.
Don't know which server you use but apache has some modules like mod_evasive for example.
You can also limit connections per second from an IP in your firewall

I'm getting the feeling this is touching more on how to update a legacy code base than anything else. While implementing some type of concurrency would be nice, the old code base is your real problem.
I highly recommend this video which discusses Technical Debt.
Watch it, then if you haven't already, explain to your boss in business terms what technical debt is. He will likely understand this. Explain that because the code hasn't been managed well (debt paid down) there is a very high level of technical debt. Suggest to him/her how to address this by using small incremental iterations to improve things.

limiting the IP connections will only make your players angry.
I fixed and rewrote a lot of stuff in some famous opensource game clones with old style code:
well, i must say that cheating can be always avoid executing the right queries and logic.
for example look at here http://www.xgproyect.net/2-9-x-fixes/9407-2-9-9-cheat-buildings-page.html
Anyway, about performace, keep in mind that code inside sessions will block all others thread untill current one is closed. So be carefull to inglobe all your code inside sessions.Also, sessions should never contain heavy data.
About scripts: in my games i have a php module that automatically rewrite links adding an random id saved in database, a sort of CSRFprotection. Human user will click on the changed link, so they will not see the changes but scripts will try to ask for the old link and after some try there are banned!
others scripts use the DOM , so its easy to avoid them inserting some useless DIV around the page.
edit: you can boost your app with https://github.com/facebook/hiphop-php/wiki

I don't know if there's an implementation already out there, but I'm looking into writing a cache server which has responsibility for populating itself on cache misses. That approach could work well in this scenario.
Basically you need a mechanism to mark a cache slot as pending on a miss; a read of a pending value should cause the client to sleep a small but random amount of time and retry; population of pending data in a traditional model would be done by the client encountering a miss instead of pending.
In this context, the script is the client, not the browser.

What might be the best way to benchmark a users PC, PHP or JS?

PHP - Apache with Codeigniter
JS - typical with jQuery and in house lib
The Problem: Determining (without forcing a download) a user's PC ability &/or virus issue
The Why: We put out a software that is mostly used in clinics, but can be used from home, however, we need to know, before they go to our mainsite, if their pc can handle the enormities of our web-based, browser-served software.
Progress: So far, we've come up with a decent way to test dl speed, but that's about it.
What we've done: In php we create about a 2.5Gb array of data to send to the user in a view, from there the view calculates the time it took to get the data and then subtracts the php benchmark from this time in order to get a point of reference of upload/download time. This is not enough.
Some of our (local) users have been found to have "crappy" pc's or are virus infected and this can lead to 2 problems. (1)They crash in the middle of preforming task in our program, or (2) their virus' could be trying to inject into our js thus creating a bad experience that may make us look bad to the average (uneducated on how this stuff works) user, thus hurting "our" integrity.
I've done some googling around, but most plug-ins or advice forums/blogs i've found simply give ways to benchmark the speed of your JS and that is simply not enough. I need a simple bit of code (no visual interface included, another problem i found with one nice piece of js lib that did this, but would take days to remove all of the authors personal visual code) that will allow me to test the following 3 things:
The user's data transfer rate (i think we have this covered, but if better method presented i won't rule it out)
The user's processing speed, how fast is the computer in general
possible test for infection via malware, adware, whatever maybe harmful to the user's experience
What we are not looking to do: repair their pc! We don't care if they have problems, we just don't want to lead them into our site if they have too many problems. If they can't do it from home, then they will be recommended to go to their nearest local office to use this software "in house" so to speak.
Further Explanation
We know your can't test the user-side stuff with PHP, we're not that stupid, PHP is mentioned because it can still be useful in either determining connection speed or in delivering a script that may do what we want. Also, this is not a software for just anyone on the net to go sign up and use, if you find it online, unless you are affiliated with a specific clinic and have a login name and what not, your not ment to use the sight, and if you get in otherwise, it's illegal. I can't really reveal a whole lot of information yet as the sight is not live yet. What I can say, is it mostly used by clinics/offices for customers to preform a certain task. If they don't have time/transport/or otherwise and need to do it from home, then the option is available. However, if their home PC is not "up to snuff" it will be nothing but a problem for them and make the 2 hours task they are meant to preform become a 4-6hour nightmare. Thus the reason, i'm at one of my fav quest sights asking if anyone may have had experience with this before and may know a good way to test the user's PC so they can have the best possible resolution, either do it from home (as their PC is suitable) or be told they need to go to their local office. Hopefully this clears things up enough we can refrain from the "sillier" answers. I need a REAL viable solution and/or suggestions, please.

PHP has (virtually) no access to information about the client's computer. Data transfer can just as easily be limited by network speed as computer speed. Though if you don't care which is the limiter, it might work.
JavaScript can reliably check how quickly a set of operations are run, and send them back to the server... but that's about it. It has no access to the file system, for security reasons.
EDIT: Okay, with that revision, I think I can offer a real suggestion - basically, compromise. You are not going to be able to gather enough information to absolutely guarantee one way or another that the user's computer and connection are adequate, but you can get a general idea.
As someone suggested, use a 10MB-20MB file and several smaller ones to test actual transfer rate; this will give you a reasonable estimate. Then, use JavaScript to test their system speed. But don't just stick with one test, because that can be heavily dependent on browser. Do the research on what tests will best give an accurate representation of capability across browsers; things like looping over arrays, manipulating (invisible) elements, and complex math. If there is a significant discrepancy between browsers, then use different thresholds; PHP does know what browser they're using, so you can give the system different "good enough" ratings depending on that. Limiting by version (like, completely rejecting IE6) may help in that.
Finally... inform the user. Gently. First let them know, "Hey, this is going to run a test to see if your network connection and computer are fast enough to use our system." And if it fails, tell them which part, and give them a warning. "Hey, this really isn't as fast as we recommend. You really ought to go down to the local clinic to perform this task; if you choose to proceed, it may take a lot longer than intended." Hopefully, at that point, the user will realize that any issues are on them, not on you.

What you've heard is correct, there's no way to effectively benchmark a machine based on Javascript - especially because the javascript engine mostly depends on the actual browser the user is using, amongst numerous other variables - no file system permissions etc. A computer is hardly going to let a browsers sub-process stress itself anyway, the browser would simply crash first. PHP is obviously out as it's server-side.
Sites like System Requirements Lab have the user download a java applet to run in it's own scope.

Security of Flex for payment website

So, it's been about 3 years since I wrote and went live with my company's main internet facing website. Originally written in php, I've since just been making minor changes here and there to progress the site as we've needed to.
I've wanted to rewrite it from the ground up in the last year or so and now, we want to add some major features so this is a perfect time.
The website in question is as close to a banking website as you'd get (without being a bank; sorry for the obscurity, but the less info I can give out, the better).
For the rewrite, I want to separate the presentation layer from the processing layer as much as I can. I want the end user to be stuck in a box and not be able to get out so to speak
(this is all because of PCI complacency, being PEN tested every 3 months, etc...)
So, being probed every 3 months has increasingly made me nervous. We haven't failed yet and there hasen't been a breach yet, but I want to make sure I continue to pass (as much as I can anyways)
So, I'm considering rewriting the presentation layer in Adobe Flex and do all the processing in PHP (effectively IMO, separating presentation from processing) - I would do all my normal form validation in flex (as opposed to javascript or php) and do my reads and writes to the db via php.
My questions are:
I know Flash has something like 99% market penetration - do people find this to be true? Has anyone seen on their own sites being in flash that someone couldn't access it?
Flash in general has come under alot of attacks about security and the like - i know this. I would use a swf encryptor - disable debugging (which i got snagged on once on a different application), continue to use https and any other means i can think of.
At the end of the day, everyone knows if someone wants in to the data bad enough, their going to find a ways in; i just wanna make it as difficult for them as i can.
Any thoughts are appreciated.
-Mario

There are always people who, for one reason or another, don't install the Flash plugin. Bear in mind that these are distinctly in the minority. Realize also that some people still refuse to enable Javascript. The question you have to ask yourself is whether this small group is enough to get you to move off of some newer technologies.
If the answer to that is yes, you will have to resort to vanilla HTML form processing, sending everything to the server for validation, etc.
If the answer is no, don't be afraid to use Flex. It works fine with https protocol, and is as secure as you want. That said, I wouldn't use it for username/password validation on the client; that information should always be encrypted and sent to a secure server. But validation of other types of field (phone number, etc.) shouldn't be a problem.

There are definitely people who don't have Flash installed and yes, there are people who have JavaScript disabled. But no matter whether you develop for the common denominator which is plain HTML forms or if you go high end, e.g. Flex or AJAX, never ever rely on the client to validate the inputs. It's a good first step, but everything that comes from the client, be it Flash or Ajax or Silverlight or whatever, could be forged.

Light Blogging system sans database

This is a general programming question.
What is the best way to make a light blogging system that can handle images, bbcode-ish styling and text without a database back end? Light means not more than 50 to 100 posts in extreme cases.
What language(s) should be used? Is there any preferred data format for the information? How does security play out?
EDIT: Client has no database, is on a shared server. Can't change that. Therefore, no DB.
EDIT2:
Someone mentioned SQL Compact - does that require anything more than copying files to the server? The key here is again that things shouldn't require any more permissions than FTP Acess.

If you're looking to do it yourself; store each post as a file in a directory. Then to sort and limit the posts you rely partially on the file names to order and limit them, and potentially (in the case of a search) on reading every last file. Don't go letting users make 10,000 posts though. But yeah, the above is considered a flat file data format. You can get fancy by using a standard format like JSON, Yaml, or XML within each post file, and even fancier by requesting these with Ajax calls in mostly client side code.
Now if the reason you want to work with flat files is that you just don't want to install a database server, there's nothing stopping you from reading a local (to the server) file as a berkley DB, a Lucene Index, or an SQLite DB from within your webapp using the appropriate client library. You'll find any of these approaches a little more sane (a bit faster, a bit more readable in code) than the afore-mentioned with all the same requirements for installing on the server (read-write file permissions). Many web frameworks or languages (like php) come with the option of an API to these client libraries; SQLite, and Lucy (C Lucene) particularly.
If you're just looking for examples of it being done, I first (I think 1999 or 2000) came across blosxom which is a perl script that either runs as a cgi script per request or as a cron job. It builds a dated index of "posts" based on whatever you throw into the directory it's meant to scan. It also builds an RSS feed.

Jekyll or Blogofile are my favorite kind of solution for that, "compiling pages before upload".

I'm going to go out on a limb here and say that it's not always the destination, but the Journey.
If you're going to set out to do this, I recommend using a language you are comfortable. Personally, this would be C#/.net for me, but from your tagging, I'll assume PHP would be the Serverside scripting language you would choose.
I would layout how I wanted my application to behave. If there is going to be a lot of data, you should consider (as dlamblin mentioned) an DB of some sort for lookup and retrieval. (Light Blog, not so much data... 1000 users can edit? maybe you should consider a DB.) Once you've decided how to store the data, decide how to present it.
Write some proof of concept code for each of the features you want to implement (blog templating, bbcode, user authentication, text searching...) and start to work them all together.

search for flat-file cms-es on google, for example:
http://www.flatcms.org/
this has been already done, so there is no need to create such CMS again. there are plenty of them.

I concur with dusoft that this has already been done.
DotNetBlogEngine.net is an ASP.NET (C#) based blogging system that has a nice XML back-end as an option.

Doesn't answer your question directly but check Unify.

If you do not want to write a new one or want to get some inspiration:
Flatpress
Simple PHP Blog
Ninja Designs are working on a db-free wordpress clone

You could either use XML, or use SQL compact (which allows for handling things just like SQL Server, but instead of a database you utilize flat files).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.