I have been digging around stack and all sorts of other sites looking for the best answer to my questions.
I am developing a very large and growing monster of a website, in the form of an information management system. At the core it is running off of PHP and MySQL. I have just updated code, in the more general sense, to mysqli, but without taking full advantage of all of the features. That is part of what I am working on now.
I have read a ton about prepared statements and this is something I certainly need to put to use given the number of statements that get re-used.
I am looking at making in the realm of about 50 prepared statements,
being used across nearly 200 different pages. Is there a recommended
way to do this? All examples I have seen deal with 1 or 2.
Due to the ever growing nature of the site, using databases and such,
one of the things that I liked with the previous mysql is that it
didn't require a connection specified for each query, but does with
mysqli. I had to tweak my functions due to this. Is there a
recommended solution for this?
I built the site in a procedural form rather that object oriented, but I am always open to suggestions, regardless of the format they use.
I'll try to be as accurate as possible, but I'm not an expert.
Your first point: You're probably looking for Stored Procedures. Basically you can store certain logic of your application for repetitive usage.
Prepared Statements, however, are different. They basically mean "Parse once, execute many times" but they're not stored on the server and carried out across connections.
In PHP, each "page load" is a different thread with its own variables and thus its connections to the database, so you cannot really use the Prepared Statement again.
As for your second point, mysql_query() doesn't require a connection handle to be passed to it simply because it assumes you want to use the last created connection.
For example:
mysql_connect();
mysql_query("SELECT * FROM table");
and
$link = mysql_connect();
mysql_query("SELECT * FROM table", $link);
are the same.
So using the connection implicitly doesn't mean scalability.
That's as far as I can write without providing possibly wrong information, so I highly recommend to you really read about this, and then if you have some question everybody here would be happy to answer.
Related
I've created following code & I can't seem to find out what's wrong with it, any help would be greatly appreciated.
<?php
$con = mysql_connect("host","user","password");
mysql_select_db("Stores", $con);
$street = mysql_query("SELECT Street FROM Sports WHERE Name = 'Nike'");
if (mysql_query("CREATE DATABASE $street", $con))
{
echo "Success";
}
else
{
echo "Fail";
}
mysql_close($con);
i keep getting fail, can anyone point me in the right direction?
There are many problems with this code:
this may not be very relevant at first, but the php mysql extension is marked deprecated, please switch to mysqli or better still, PDO as is also indicated in the official documentation. Deprecated means that it's only there to support legacy code (and that implies that it should not be used in new code) and that it will disappear at some point in the future. I suggest PDO as it's a much more mature, flexible and universal database interface, if you are going to invest time in learning something, PDO is definitely the way to go. Pay special attention to prepared statements, they will help you avoid all kinds of problems without having to fumble with the various escape functions.
mysql_query returns a query result (as a resource) that you should read eg with some kind of fetch command - see the doc for PDO's query() method here
I haven't got the slightest idea why you would want to create a new database on your server named after the street you found via a query? You should probably explain what your intention is, I'm quite sure this isn't going to do what you want it to do.
EDIT based on the comments:
For me the main advantages of switching to PDO are
future-proofness: with the mysql_ extension you'll upgrade php one day and the extension will be gone, forever. That won't happen with PDO anytime soon.
support for multiple databases: traditionally each database brand was served by its own proprietary extension in mysql. PDO unifies all these extensions which means that with 1 API you can work with most common databases in existence. Note that it doesn't iron out dialect differences between these databases, but having one API for them all is definitely a big plus.
PDO actively encourages you to use prepared statements, which are generally recognized as a bulletproof yet simple protection against all kinds of security issues like SQL injection - no need for all that escaping nonsense that you'll need to do to make regular SQL statements somewhat safe.
Regarding the remark about creating a database for each user, I really think you need to go through some introductory material wrt relational databases. Most commonly an application is backed by one database containing in most cases a fixed set of tables (eg store, customer, order, orderitems) with relations between them. Eg a store has many customers, each of which have one or more orders, each of which contain one or more items. All data is fetched by utilizing these relations via queries, using joins to extract for example all items belonging to one order, or to list all orders associated with one customer. The important thing here is that in all but the most exceptional cases there 's just one table per data type. That is, all customers are stored in just one table, all orders are stored in just one table, and so on.
I have no time to read all of it, but databaseprimer.com might be helpful to get you started with these concepts.
mysql_query will return a resource. You have to fetch the data first using mysql_fetch_assoc($query)
Add this line after your query:
$fetch = mysql_fetch_assoc($streets);
Then in your second query change $streets to $fetch['street'].
In certain functions of the code, php will execute hundreds or in some cases thousands of queries on the same tables using a loop. Currently, it creates a new database connection for each query. How expensive is that operation? Would I see a significant speed increase by reusing the same connection? It could take quite a bit of refactoring to change this behavior and use the same database.
The php uses mysql_connect to connect to the database.
Just based on what I've said here, are there other obvious optimizations that you would recommend (I've read about locking tables for example...)?
EDIT:
My question is more about the benefit of using a single connection, not how to avoid using more than one.
The documentation for mysql_connect states:
If a second call is made to mysql_connect() with the same arguments, no new link will be established, but instead, the link identifier of the already opened link will be returned.
So, unless you're connecting with different credentials, changing that part of your code will not affect performance.
I use Zend_Framework and my database profiling shows that the connection itself takes nearly 10x longer than most of my queries. I have two different databases that I connect to, and only connect once to each for each request.
I'd say reconnecting for every query is poor design, but the question of refactoring is more complex than that. Questions that need to be asked:
Are there current performance problems?
Have you done code profiling to narrow down where the performance issues are occurring?
How much time will be required for this refactoring? Take into account the testing involved, not just coding time.
The answer to the original question should be obvious. If its not obvious to you then it should still be obvious how to find out for yourself how much impact it has.
are there other obvious optimizations
No - because you've not provided any details of the table's structure nor the queries you are running.
I'm working on what is turning out to be a fairly complex SELECT query. I have several hierarchical queries being nested in a single SELECT and it is getting to be quite difficult to manage.
I'm running into a few places where my inline views need to be executed in more than one place, so it seems like a reasonable idea to execute those once at the beginning of a stored procedure and then do some iteration over the results as needed.
I'm wondering if there are any reasons why I should not, or could not, execute an Oracle Stored Procedure, called via my PHP code, and return as an OUT parameter the resultset. I've tended to use SPs only to do updates/deletes/inserts but the sheer size and complexity of this query seems like it needs to be broken down.
If there aren't any technical problems with this, any comments on whether it is good or bad practice?
Im working on what is turning out to be a fairly complex SELECT query. I have several hierarchical queries being nested in a single SELECT and it is getting to be quite difficult to manage.
Ok, but why a stored procedure? Why not create a view instead?
I'm running into a few places where my inline views need to be executed in more than one place, so it seems like a reasonable idea to execute those once at the beginning of a stored procedure and then do some iteration over the results as needed.
Again - excellent use case for a view.
I'm wondering if there are any reasons why I should not, or could not, execute an Oracle Stored Procedure, called via my PHP code, and return as an OUT parameter the resultset.
If there aren't any technical problems with this, any comments on whether it is good or bad practice?
Well, I don't want to start a religous war, and I do not want to suggest the arguments against apply to your case. But here goes:
one reason why I tend to avoid stored procedures is portability - by that I mean mostly database portability. Stored procedure languages are notoriously unportable across dbs, and built-in libs like Oracle packages make things worse in that respect.
stored procedures take some additional processing power from your database server. this makes it harder to scale the application as a whole: if the capacity of your db server is exhausted due to stored procedures, and you need to upgrade harware or even buy an extra oracle software license because of that, I would not be a happy camper, especially if I could have bought cheap webserver/php boxes instead to do the computing.
Reasons where I would go for stored procedures:
language portability. If database portability is not so much an issue, but you do want to reuse logic across multiple applications, or have to ability to code in different languages, then stored procedures may save you writing language specific database invocation code.
complex permission scenarios. stored procedures give you uan extra level of permissions, since you can execute the procedure with the privileges of the definer or owner of the stored procedure. Sometimes this solves problems where a user needs to work with some tables, but cannot be allowed direct access to them.
saving rountrips: if you have to deal with complex, multistatement transactions, putting them in a stored procedures saves rountrips between the app and the db, because there is only one rountrip needed to execute the stored procedure. sometimes this can get you more performance.
I want to stress again that in all these scenarios, I would still advise to not put all your procedural logic in stored procedures. databases are best at storing and retrieving data, languages like php/java/perl/pick your poison are better at processing it.
If you are using the same inline view many times, its a good candidate for with clause
PHP can handle resultsets returned from stored procedures, by using Ref Cusrors. The Oracle+PHP Cookbook has an example.
So there are no technical impediments but as you can see from the various answers there are some philosophical aspects to your question. I think we can agree that if you are already wrapping some SQL statements in stored procedures - which you are - then you are not drastically compromising the portability of your system by extending "updates/deletes/inserts" to include selects.
The pertinent question then becomes "should you embed use a stored procedure for this particular query?" The answer to which hinges on precisely what you mean by:
the sheer size and complexity of this
query seems like it needs to be broken
down.
Deconstructing a big query into several smaller queries and then stitching results together in PL/SQL is seductive, but should be approached with caution. This can degrade the performance of your application, because PL/SQL has more overheads than SQL. Making your query more readable is not a good enough reason: you need to be certain that the complexity has a real and adverse effect on the running of your code.
A good reason for using a stored procedure rather than a view might be if you want to extend the applicability of the query by using bind variables or dynamic SQL in the body of the query.
A definitive answer to your question requires more details regarding the nature of your query and the techniques you are thinking of using to simplify it.
You could look at subquery factoring which may improve the readability of the query.
One risk of breaking up a single SQL query into a more procedural solution is you lose read consistency. As such you want to be pretty sure that someone changing data while your procedure runs won't break it. You may want to lock a table fore the duration of the procedure call. It seems drastic, but if you are pretty sure that the data is static and if there would be ugly side-effects if it wasn't, then it is a solution.
Generally if an SQL statement is complex enough, it probably isn't portable between databases anyway, so I wouldn't worry about that aspect.
Views can be a good option to hide complexity, but the downside to hiding complexity is that people start doing things that seem 'simple' but are really complex and don't work as desired. You also get another object to consider for grants etc. [Edit: As Roland commented, this applies equally to stored procedures, views, object types etc.]
If you expect to return a large resultset, you should consider a pipelined table function. That way you can avoid having the entire resultset in the Oracle session at the same time.
I am making a simple blog for myself and while reading through the PHP manual, I found this http://us2.php.net/manual/en/function.pg-query.php
It says
resource pg_query ([ resource
$connection ], string $query )
...
Note: Although connection can be omitted, it is not recommended, since
it can be the cause of hard to find
bugs in scripts.
Why is it not ok to just use the last connection? I never plan on having more than 1 connection open per PHP script, so how would this ever cause bugs for me?
Hah. "I never plan on having more than 1 connection open per PHP script."
I remember the last time I said that. It was back in 'ought three. I was a young whippersnapper then, much like yourself. Full of spit and vinegar. Why do something if I don't have to? That was the prevailing wisdom in our little dot-com startup. "Just get it done!" we'd shout. Also, we wore onions on our belts.
Well... time came that I added a quick little statistics database in to the main site. Nothing special, just wanted some stats tracked separately. I figured I'd re-use the database wrapper. It was a good wrapper for it's time! Abstracted out all the database functions I'd need. But as soon as I added it in there, some wacky things started happening. It didn't make sense. I had two separate database wrapper objects... two separate connections! How could they affect each other? But then users would be logged out randomly. Sessions would fail. Sometimes a key update would go bad. Some queries ran on the wrong databases. Dogs and cats started living together! It was mass hysteria!
If only I had specified that connector originally. If only I had kept them specific, so pg_query would know which one to use. So much data loss could have been prevented. So many good tuples... so much good data. Lost. Lost...
*sniff *
This is to accommodate those who need explicit calls to various databases. If you don't, ignore it :) Some scripts work on local and remote databases, and yet others work on multiple local or multiple remote databases.
Perhaps it is due to the order of arguments? If it was regarding a recommendation to explicitly use a connection resource, the same note would be on mysql_query(). Unless there is something specific to PostgreSQL I am unaware of.
In short, I don't see any problem omitting the connection argument for single connection applications.
Is it generally better to run functions on the webserver, or in the database?
Example:
INSERT INTO example (hash) VALUE (MD5('hello'))
or
INSERT INTO example (hash) VALUE ('5d41402abc4b2a76b9719d911017c592')
Ok so that's a really trivial example, but for scalability when a site grows to multiple websites or database servers, where is it best to "do the work"?
I try to think of the database as the place to persist stuff only, and put all abstraction code elsewhere. Database expressions are complex enough already without adding functions to them.
Also, the query optimizer will trip over any expressions with functions if you should ever end up wanting to do something like "SELECT .... WHERE MD5(xxx) = ... "
And database functions aren't very portable in general.
I try to use functions in my scripting language whenever calculations like that are required. I keep my SQL function useage down to a minimum, for a number of reasons.
The primary reason is that my one SQL database is responsible for hosting multiple websites. If the SQL server were to get bogged down with requests from one site, it would adversely affect the rest. This is even more important to consider if you are working on a shared server for example, although in this case you have little control over what the other users are doing.
The secondary reason is that I like my SQL code to be as portable as possible. I don't even want to try to count the different flavors of SQL that exist, so I try to keep functions (especially non-standard extensions) out of my SQL code, except for things like SUM or MIN/MAX.
I guess what I'm saying is, SQL is designed to store and retrieve data, and it should be kept to that purpose. Use your serving language of choice to perform any calculations beforehand, and keep your SQL code portable.
Personally, I try to keep the database as simple (to the minimum) with Insert, Update, Delete without having too much function that can be used in code. Stored Proc is the same, contain only task that are very close to persistence data and not business logic related.
I would put the MD5 outside. This will let met have this "data manipulation" outside the storage scope of the database.
But, your example is quite "easy" and I do not think it's bad to have it inside...
Use your database as means of persisting and mantaining data integrity. And leave business logic outside of it.
If you put business logic, any of it, in your database, you are making it more complex to manage and mantain in the future.
I think most of the time, you're going to want to leave the data manipulation to the webserver but, if you want to process databases with regards to tables, relations, etc., then go for the DB.
I'm personally lobbying my company to upgrade our MySQL server to 5.0 so that I can start taking advantage of procedures (which is killing a couple of sites we administer).
Like the other answers so far, I prefer to keep all the business logic in one place. Namely, my application language. (More specifically, in the object model, if one is present, but not all code is OO.)
However, if you look around StackOverflow for (my)sql-tagged questions about whether to use inline SQL or stored procedures, you'll find that most of the people responding to those are strongly in favor of using stored procs whenever and whereever possible, even for the most trivial queries. You may want to check out some of those questions to see some of the arguments favoring the other approach.