Physical directories for posts and users

Physical directories for posts and users - php

Is there anything wrong in the practice that consists of creating physical directories for each post (which is created) and each user (who registers) on a website?
For an architecture which resembles:
www.site.com/u/john
www.site.com/u/mike
www.site.com/post/za634df
www.site.com/post/df124zs
(PHP, Linux based FYI).
I indeed always thought I needed a physical folder if I wanted a post URL to be able to be shared on Facebook for example etc. Is this true?
Plus, by doing so, would I run into problems? slow downs? server directories limits? etc.

I don't think it's a terribly good idea. For one, it's orders of magnitudes slower to manipulate the filesystem than to make an entry in a database, so there's this obvious performance loss, and the number of subdirectories may be limited depending on the filesystem (which may cause portability issues on the long run). There may be other effects, but those would come up with only really big traffic.
But the design perspective is much more important here: the physical file structure is a wholly different layer than the URI layout, which is an abstraction for it. By tying your URI scheme to the physical layout you discard the advantages of this abstraction. As usual with design principles, this may not seem like a big deal if your project is small, short-lived, or you're in a big hurry and don't care; but on the long run, keeping the separation of concerns can be quite vital for everybody's sanity.
That said, your idea may have merits if properly implemented, though in my opinion it'd be better to use a database and an URL-rewriting engine (like Apache's mod_rewrite) to achieve the same effect. And even if you do end up creating folders for everything, make absolutely sure that the procedure is properly abstracted and no piece of code relies on stuff being in the same directory, and no piece of code actually manipulates the filesystem dirctly (relying instead on a single unified helper class to do it).
Oh and, no, Facebook can't see what's behind your URI scheme.

Why Would you want to make several directories for each user and add same codes in each directory when you can simply handle them all using rewrite url from a single script .
Several scripts will also cost you much-much more storage and will make you cry while updating or adding some new feature in your site.

Related

Code design and structure

It has recently been highlighted (in my previous questions) that the way I have designed web applications is not ideal.
Consider the following. I am working on a multi-user website with lots of different sections including profiles and forums and support tickets. The structure is as follows:
A main page in which all the other pages are included or *required_once* we'll call it home.php.
In home.php, one of the first things loaded is router.php, this handles every single $_GET and $_POST that the user could possibly produce, and every form and process is sorted via a main variable called $data_process. Router.php is essentially just one giant switch() statement for $data_process. This parses all the data and gives a result.
Next included is header.php, which will not only process the neccessary variables for the page that will be loaded but also set up the header and decided exactly what is going to be shown there, e.g. menu, user info, and information about the page currently viewing (i.e. Home > Support > View Ticket).
Then the page is loaded according to $page variable. A simple include.
Then footer.php, then close.
And so the dynamic website is created. I was told this is bad practice by a user named #HorusKol. I am very pleased with this website as it is the most streamlined and easy to write website I have ever used. If this is still bad code design? What is perfect code design?
PS - can anyone recommend any good easy to read books that explain PHP, MySQL and design structure for me?

It is bad design because you process a lot of data that is perhaps not necessary in the rest of the process. The router should only process the url, processing of post data is handled somewhere else. Only include what you need, including everything makes things slow.
A better way is to structure you app more in different parts. A router that is processing the url, a controller that runs a action based on a routed request, a view that processes all html and pages, a model for accessing data. MVC is what comes in mind.
There is no such thing is the perfect code design.

There's no canonical definition of "good design" - the best you can hope for is that your design balances the various forces on your project in the optimum way, Forces on your project might be maintainability, performance, scalability, extensibility - classic non-functional requirements - but also things like search engine optimization, standards compliance and accessibility (things that apply to web projects in particular).
If all your URLS are of the form "www.mysite.com/home.php?action=getDetails&productID=123", your search engine friendliness is pretty low. It's far better to have semantic URLs - "www.mysite.com/products/DesktopPc/details.php". You can achieve this through cunning .htaccess trickery in your current design.
From a maintainability point of view, your design has some issues. If I've understood it correctly, adding a new page to the site requires you to modify the code in several different source files - router.php (your giant switch statement), the page itself, and probably the header.php as well. That indicates that the code is tightly coupled. Modifying the giant switch statement sounds like a likely source of entertaining bugs, and the combination of the router and the header, manipulating the variables, plus the actual page itself seems a little fragile. This is okay if you're the only person working on the project, and you're going to be around for the long term; if that's not the case, it's better to use an off-the-shelf framework (MVC is the current favourite; Zend, Symphony and Cake all do this well in PHP) because you can point new developers at the documentation and expect them to get up to speed.
One of the biggest enemies of maintainability is complexity - complex code is harder to work with, and harbours more bugs. There's a formal metric for complexity, and I'm pretty sure your switch statement scores very highly on that metric - in itself not necessarily a huge problem, but definitely something to keep an eye on. Lots of MVC frameworks avoid this by having the routing defined as data rather than code (i.e. have the routes in a configuration file), and/or by using convention over configuration (i.e. if the request is for page "productDetails", include the file "/inc/productDetails.inc").
Extensibility could be another concern - imagine having to expose your site content as JSON or XML as well as HTML; in your current design, that would require a lot of change, because every single item in your page processing pipeline cares and needs to know. The home.php needs to know not to send HTML, the header and footer need to know, the router needs to understand how to handle the additional data type, almost certainly making the switch statement even bigger. This again may not be a big deal.
Both extensiblity and maintainability are helped by being able to unit test your code. Test Driven Development turns this into a whole routine in its own right; I'm guessing that testing your application is hard - but that's just a guess; it depends more on how you've factored the individual lumps of code than what you've described above. However, another benefit of MVC is that it makes it easy to write unit tests for key parts of your system.
So, if the forces on your project don't include an emphasis on maintainability or extensibility, and you can handle the SEO aspect, I don't think your design is necessarily "bad"; even if you do care about those things, there are other things you can do to accommodate those forces - write documentation, hire lots of cheap coders.
The best way to get up to speed with these design topics are not books on PHP or MySQL; I'd recommend "Refactoring" and "Patterns of enterprise application architecture" by Martin Fowler, "Design Patterns" by Gamma et al. and Code Complete by McConnell (though that's a touch out of date by now).

Why should MVC for websites require a single point of entry?

I see many MVC implementations for websites have a single-entry point such as an index.php file and then parses the URL to determine which controller to run. This seems rather odd to me because it involves having to rewrite the URL using Apache rewrites and with enough pages that single file will become bloated.
Why not instead just to have the individual pages be the controllers? What I mean is if you have a page on your site that lists all the registered members then the members.php page users navigate to will be the controller for the members. This php file will query the members model for the list of members from the database and pass it in to the members view.
I might be missing something because I have only recently discovered MVC but this one issue has been bugging me. Wouldn't this kind of design be preferable because instead of having one bloated entry-file that all pages unintuitively call the models and views for a specific page are contained, encapsulated, and called from its respective page?

From my experience, having a single-entry point has a couple of notorious advantages:
It eases centralized tasks such as resource loading (connecting to the db or to a memcache server, logging execution times, session handling, etc). If you want to add or remove a centralized task, you just have to change a singe file, which is the index.php.
Parsing the URL in PHP makes the "virtual URL" decoupled from the physical file layout on your webserver. That means that you can easily change your URL system (for example, for SEO purposes, or for site internationalization) without having to actually change the location of your scripts in the server.
However, sometimes having a singe-entry point can be a waste of server resouces. That applies obviously to static content, but also when you have a set of requests that have a very specific purpose and just need a very little set of your resorces (maybe they don't need DB access for instance). Then you should consider having more than one entry point. I have done that for the site I am working on. It has an entry point for all the "standard" dynamic contents and another one for the calls to the public API, which need much less resources and have a completely different URL system.
And a final note: if the site is well-implemented, your index.php doesn't have to become necessarily bloated :)

it is all about being DRY, if you have many php files handling requests you will have duplicated code. That just makes for a maintenance nightmare.
Have a look at the 'main' index page for CakePHP, https://github.com/cakephp/cakephp/blob/master/app/webroot/index.php
no matter how big the app gets, i have never needed to modify that. so how can it get bloated?

When deeplinking directly into the controllers when using an MVC framework it eliminates the possibility of implementing controller plugins or filters, depending on the framework you are using. Having a single point of entry standardizes the bootstrapping of the application and modules and executing previously mentioned plugins before a controller is accessed.
Also Zend Framework uses its own URL rewriting in the form of Routing. In the applications that use Zend Framework I work on have an .htaccess file of maybe 6 lines of rewriterules and conditions.

A single entry point certainly has its advantages, but you can get pretty much the same benefit from a central required file at the top of every single page that handles database connections, sessions, etc. It's not bloated, it conforms to DRY principles (except for that one require line), it seperates logic and presentation and if you change file locations, a simple search and replace will fix it.
I've used both and I can't say one is drastically better or worse for my purposes.

Software engineers are influencing the single point of entry paradigm. "Why not instead just to have the individual pages be the controllers?"
Individual pages are already Controllers, in a sense.
In PHP, there is going to be some boilerplate code that loads for every HTTP request: autoloader require statement (PSR-4), error handler code, sessions, and if you are wise, wrapping the core of your code in a try/catch with Throwable as the top exception to catch. By centralizing code, you only need to make changes in one place!
True, the centralized PHP will use at least one require statement (to load the autoloader code), but even if you have many require statements they will all be in one file, the index.php (not spread out over a galaxy of files on under the document root).
If you are writing code with security in mind, again, you may have certain encoding checks/sanitizing/validating that happens with every request. Using values in $_SERVER or filter_input_array()? Then you might as well centralize that.
The bottom line is this. The more you do on every page, the more you have a good reason to centralize that code.
Note, that this way of thinking leads one down the path of looking at your website as a web application. From the web application perspective, a single point of entry is justifiable because a problem solved once should only need to be modified in one place.

Whats the simplest way to obscure or hide the paths being used in a PHP - AS3 project

I am working on a project that uses PHP , AS3, and AMFPHP .
The project allows users to upload and download images among other things. Since I am fairly new to PHP/FLash security I have been trying to gather as much info about making things as secure as possible. I've got some good advise about using .htaccess files, and a few other tricks.
My main question at the moment is how to hide the "path" info from and to the PHP / assets / and to and from the AMFPHP services ...
Currently I have all the paths hard-coded in one .as that returns an object with the paths to any of the other classes that need/request it. I found this method to work well since I only need to change this one .AS , and it will branch out to the other classes that need it.
I'm not super worried about others decompiling my code, and they could probably "sniff" out the paths if they really wanted. I'm mostly concerned with allowing others easy access to all of my AMFPHP services or being allowed to parts of the site I do not wish them to be. basically I realize that things aren't gonna be 100% secure regardless, but would like to take precautions.
So my main question is ...
Whats the best- simplest way to obscure / hide the paths being used in a PHP - AS3 project ? ... I entertained the possibly of PHP includes or even a SQL database if need be. I rather not spend a bunch of time and money on questionable obfuscatory software, unless there's a tried and true ( and inexpensive) one for flash (not flex). .. and I currently do not have a SSL but don't know how critical - common this is. --

As you've noted, anyone could find out your paths by using Wireshark to watch traffic sent to your site, or a Flash decompiler to look at your source code and find the links directly.
I don't think it sounds worth the trouble to try to hide your paths, since all it would be adding is a slight layer of obscurity. Anyone interested could figure it out with relatively little effort, but the average person would have no clue whatsoever about how to make an AMF call to one of your services. Instead, I'd concentrate on making your AMFPHP functions themselves as secure as possible.

You could use a mod_rewrite file (with Apache) to remove or change the file extensions for your pages.
RewriteEngine on
RewriteRule ^bob.php$ bob.html
See http://www.workingwith.me.uk/articles/scripting/mod_rewrite for more examples.
This would not change the links hardcoded in flash but could make them less obvious to a user.
If you are using Windows then you can use OBFU to obfuscate your flash code. It is Expensive but very secure. There are a few open source alternatives but not as secure.
See http://tech.motion-twin.com/obfu.html
But what Code Duck is saying is correct in that there is no way to completely protect it.

Having a single entry point to a website. Bad? Good? Non-issue?

This question stems from watching Rasmus Lerdorf's talk from Drupalcon. This question and his talk have nothing specifically to do with Drupal, by the way... it was just given at their con. My own question also has nothing specific to do with PHP. It is the single entry point in general that I am curious about.
These days it seems that most frameworks offer a single entry point for whatever you build with them. In his talk Rasmus mentions that he thinks this is bad. It seems to me that he would be correct in this thinking. If everyone hitting the site is coming in through the same entry point wouldn't things bog down after traffic reached a certain point? Wouldn't it be more efficient to allow people direct access to specific points in a site without having their request go through the same point? But perhaps the actual impact is not very bad? Maybe modern architecture can handle it? Maybe you have to be truly gigantic in scale before it becomes even worth considering? I'm curious as to what people on this site think about this issue.

In short, Rasmus or the interpretation is wrong.
This shows a clear lack of understanding how computers work. The more something gets used, the more likely it's closer to the CPU, and therefore faster. Mind you, a single point of entry != single point of failure. But that's all beside the point, when people say single point of entry, we're talking about the app, it is a single point of entry for your logic.
Not to mention it's architecturally brain-dead not to have a central point of entry, or reduce the number of entries points in general. As soon as you want to do one thing across your app at every entry point, guess how many places need to change? Having dealt with an app that each page stood on it's own, it sucked having to change, and I assure you, we needed it.

The important thing is that you use a web framework that supports scalability through methods like load-balancing and declarative code.
No, a single-entry point does not in itself make a bottleneck. The front page of Google gets a lot of hits, but they have lots of servers.
So the answer is: It doesn't matter.

Like anything in software development, it depends. Rasmus's objection to the front-controller style frameworks is the performance hit you take from having to load so much code up-front on each request. This is 100% true. Even if you're using a smart-resource loading module/object/etc of some kind, using a framework is a performance trade-off. You take the performance hit, but in return you get back
Encouraged seperation of "business logic" (whatever that is) and Template/Layout Logic
Instant and (more importantly) unified access to the objects you'll use for database queries, service called, your data model, etc.
To a guy like Rasmus, this isn't worth the performance hit. He's a C/C++ programmer. For him, if you want to separate out business logic in a highly performant way, you write a C/C++ Extension to PHP.
So if you have an environment and team where you can easily write C/C++ extensions to PHP and your time-to-market vs. performance ratio is acceptable, then yes, throw away your front-controller framework.
If that's not your environment, consider the productivity increases a front-controller framework can bring to your (likely) simple CRUD Application.

I think one of the biggest advantages you have over having only a single point of entry is security. All of the input going in is less likely to corrupt the system if it is checked and validated in a single place.

I think it's a big misunderstanding discussing this from the point of "one file" vs. "several files".
One tends to think that because the entry point is in a single file, then all we have to focus on is the code in that one file - that's wrong.
All of the popular frameworks has tons of files with entry manipulation code, interpretation code, and validation code for requests. The code is not located in one place, rather is spread around in a jungle of require/include statement calling different classes depending on whats being requested and how.
In both cases the request is really handled by different files.
Why then should I have a single entry point with some kind of _detect_uri() function that needs to call several other functions like strpos(), substr(), strncmp() to clean up, validate, and split up the request string when I can just use several entry points eliminating that code all together?
Take a look at CodeIgniters _detect_uri() function in URI.php. Not to pick on CodeIgniter, it's just an example. The other frameworks does it likewise.
You can achieve the goals of a MVC pattern with a single entry point as well as with several entry points.

This is what I thought at first, but it doesn't seem to be an impact. After all, your entry point is (usually) only doing a couple of things: setting some environment constants, including the bootstrap loader, optionally catching any exceptions thrown and dispatching the front controller. I think the reason that this is not inefficient is because this file does not change depending on the controller, action or even user.
I do feel this is odd however. I'm building a small MVC framework myself at the moment but it's slightly reverse to most frameworks I've used. I put controller logic in the accessed file. For example, index.php would contain the IndexController and it's actions. This approach is working well for me at least.

As most of the php mvc frameworks use some sort of url rewriting, or at least parse anything after index.php at their own, a single entry point is needed.
Besides that, i like to provide entry points per context, say web(/soap)/console/...

Just to add, the thing people usually think is that, since there is one php page, it is a single page serving all requests. Sort of like queuing.
The important thing to note is that, each request creates an instance of the script and thus, the load is the same as if two different pages were being accessed at the same time. So, still the same load. Virtually.
However, some frameworks might have a hell whole lot of stuff you don't need going on in the entry script. Sort of like a catchall to satisfy all possible scenarios, and that might cause load.
Personally, I prefer multi pages, the same way I mix oop and procedural. I like php the old school way.

There are definitely disadvantages to using a front-controller file architecture. As Saem explains, accessing the same file is generally advantageous to performance, but as Alan Storm and Rasmus Lerdorf explain, trying to make code handle numerous situations (such as a front-controller tries to handle the request to every "page" on the site) leads to a bloated and complicated collection of code. Using a bloated and complicated collection of code for every page load can be considered bad practice.
Saem is arguing that a front-controller can save you from having to edit code in multiple locations, but I think that in some cases, that is better solved by separating out redundancies.
Having a directory structure that is actually used by the web server can make things simpler for the web server and more obvious for the developers. It is simpler for a web server to serve files by translating the url into the location as they would traditionally represent than to rewrite them into a new URL. A non-front-controller file architecture can be more obvious to the developer because instead of starting with a controller that is bloated to handle everything, they start with a file/controller that is more specific to what they need to develop.

PHP performance considerations?

I'm building a PHP site, but for now the only PHP I'm using is a half-dozen or so includes on certain pages. (I will probably use some database queries eventually.)
Are simple include() statements a concern for speed or scaling, as opposed to static HTML? What kinds of things tend to cause a site to bog down?

Certainly include() is slower than static pages. However, with modern systems you're not likely to see this as a bottleneck for a long time - if ever. The benefits of using includes to keep common parts of your site up to date outweigh the tiny performance hit, in my opinion (having different navigation on one page because you forgot to update it leads to a bad user experience, and thus bad feelings about your site/company/whatever).
Using caching will really not help either - caching code is going to be slower than just an include(). The only time caching will benefit you is if you're doing computationally-intensive calculations (very rare, on web pages), or grabbing data from a database.

Sounds like you are participating in a bit of premature optimization. If the application is not built, while performance concerns are good to be aware of, your primary concern should be getting the app written.
Includes are a fact of life. Don't worry about number, worry about keeping your code well organized (PEAR folder structure is a lovely thing, if you don't know what I'm talking about look at the structure of the Zend Framework class files).
Focus on getting the application written with a reasonable amount of abstraction. Group all of your DB calls into a class (or classes) so that you minimize code duplication (KISS principles and all) and when it comes time to refactor and optimize your queries they are centrally located. Also get started on some unit testing to prevent regression.
Once the application is up and running, don't ask us what is faster or better since it depends on each application what your bottleneck will be. It may turn out that even though you have lots of includes, your loops are eating up your time, or whatever. Use XDebug and profile your code once its up and running. Look for the segments of code that are eating up a disproportionate amount of time then refactor. If you focus too much now on the performance hit between include and include_once you'll end up chasing a ghost when those curl requests running in sync are eating your breakfast.
Though in the mean time, the best suggestions are look through the php.net manual and make sure if there's a built in function doing something you are trying to do, use it! PHP's C-based extensions will always be faster than any PHP code that you could write, and you'll be surprised how much of what you are trying to do is done already.
But again, I cannot stress this enough, premature optimization is BAD!!! Just get your application up off the ground with good levels of abstraction, profile it, then fix what actually is eating up your time rather than fixing what you think might eat up your time.

Strictly speaking, straight HTML will always serve faster than a server-side approach since the server doesn't have to do any interpretation of the code.
To answer the bigger question, there are a number of things that will cause your site to bog down; there's just no specific threshold for when your code is causing the problem vs. PHP. (keep in mind that many of Yahoo's sites are PHP-driven, so don't think that PHP can't scale).
One thing I've noticed is that the PHP-driven sites that are the slowest are the ones that include more than is necessary to display a specific page. OSCommerce (oscommerce.com) is one of the most popular PHP-driven shopping carts. It has a bad habit, however, of including all of their core functionality (just in case it's needed) on every single page. So even if you don't need to display an 'info box', the function is loaded.
On the other hand, there are many PHP frameworks out there (such as CakePHP, Symfony, and CodeIgniter) that take a 'load it as you need it' approach.
I would advise the following:
Don't include more functionality than you need for a specific page
Keep base functions separate (use an MVC approach when possible)
Use require_once instead of include if you think you'll have nested includes (e.g. page A includes file B which includes file C). This will avoid including the same file more than once. It will also stop the process if a file can't be found; thus helping your troubleshooting process ;)
Cache static pages as HTML if possible - to avoid having to reparse when things don't change

Nah includes are fine, nothing to worry about there.
You might want to think about tweaking your caching headers a bit at some point, but unless you're getting significant hits it should be no problem. Assuming this is all static data, you could even consider converting the whole site to static HTML (easiest way: write a script that grabs every page via the webserver and dumps it out in a matching dir structure)
Most web applications are limited by the speed of their database (or whatever their external storage is, but 9/10 times that'll be a database), the application code is rarely cause for concern, and it doesn't sound like you're doing anything you need to worry about yet.

Before you make any long-lasting decisions about how to structure the code for your site, I would recommend that you do some reading on the Model-View-Controller design pattern. While there are others this one appears to be gaining a great deal of ground in web development circles and certainly will be around for a while. You might want to take a look at some of the other design patterns suggested by Martin Fowler in his Patterns of Enterprise Application Architecture before making any final decisions about what sort of design will best fit your needs.
Depending on the size and scope of your project, you may want to go with a ready-made framework for PHP like Zend Framework or PHP On Trax or you may decide to build your own solution.
Specifically regarding the rendering of HTML content I would strongly recommend that you use some form of templating in order to keep your business logic separate from your display logic. I've found that this one simple rule in my development has saved me hours of work when one or the other needed to be changed. I've used http://www.smarty.net/">Smarty and I know that most of the frameworks out there either have a template system of their own or provide a plug-in architecture that allows you to use your own preferred method. As you look at possible solutions, I would recommend that you look for one that is capable of creating cached versions.
Lastly, if you're concerned about speed on the back-end then I would highly recommend that you look at ways to minimize your calls your back-end data store (whether it be a database or just system files). Try to avoid loading and rendering too much content (say a large report stored in a table that contains hundreds of records) all at once. If possible look for ways to make the user interface load smaller bits of data at a time.
And if you're specifically concerned about the actual load time of your html content and its CSS, Javascript or other dependencies I would recommend that you review these suggestions from the guys at Yahoo!.

To add on what JayTee mentioned - loading functionality when you need it. If you're not using any of the frameworks that do this automatically, you might want to look into the __autoload() functionality that was introduced in PHP5 - basically, your own logic can be invoked when you instantiate a particular class if it's not already loaded. This gives you a chance to include() a file that defines that class on-demand.

The biggest thing you can do to speed up your application is to use an Opcode cache, like APC. There's an excellent list and description available on Wikipedia.
As far as simple includes are concerned, be careful not to include too many files on each request as the disk I/O can cause your application not to scale well. A few dozen includes should be fine, but it's generally a good idea to package your most commonly included files into a single script so you only have one include. The cost in memory of having a few classes here and there you don't need loaded will be better than the cost of disk I/O for including hundreds of smaller files.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.