I have some several codes in PHP to do jobs that MySQL could do.
such as sorting, merging each data from different MySQL tables, etc...
Lately, I found out that I can do all these stuffs with one MySQL query.
I am wondering is it better to give the MySQL capable jobs to MySQL or to PHP.
efficiencies, speed, etc..
Thank You,
If you do it in PHP you are just re-implementing the features that MySQL already has. It's far from the most optimized solution and therefore it is much slower.
You should definately do it in the SQL query.
Your performance will increase if you let MySQL handle that work.
It will be better performing to do this in MySQL. Firstly, it has optimized sorting algorithms for the data and can utilize indexes which are created. Furthermore, if it is merging and filtering, you will end up transfering less data from the database.
Databases are optimized to carry out these functions while retrieving the data. Sorting at database level is much more easier to read than writing tens of line for coding in PHP over the lists or collection
There are ready String functions available in MySQL to merge the data while retrieving the data from the database.
I definitely would suggest MySQL.
DO it in MySQL. There's no question that is more efficient. PHP will use much more memory, for one.
No question: MySQL is built for this.
To add something, maybe you'd be intrested in building joint table queries (multiple table queries). It is very helpful and really very simple. For instance:
$query = "SELECT DISTINCT post.title as title, post.id as id,
product.imageURL as imageURL, product.dueDate as dueDate
FROM post, product
WHERE post.status='saved'
AND post.productURL=product.linkURL
AND post.userEmail='$session[userEmail]'
AND NOT EXISTS(
SELECT publication.postId FROM publication
WHERE publication.postId=post.id
)
ORDER BY post.id";
This is a simple example from some code i built.
The thing is it merges 2 different tables with the restriction of post.productURL=product.linkURL. It also uses negation, pretty useful when the set you are looking for is not defined by any condition but instead the absence of one.
You can avoid this by building views in MySQL as well.
I'm a newbie myself, so I hope it helps. Cheers.
Related
Lets say I have a general DB class with a query method that is used all over my source code and I want to find out table names that are used in my mysql queries.
The limitation is that I'm using MySql 5.5.36.
Also lets assume that we are talking about millions of tables that I'm using and that using the mysql information schema is not going to happen.
What I would like to know is there an easy way to get table names used?
Explain is obviously good for SELECT statments but since its MySql 5.5.36 I can't use it on replace,update,insert etc.
PDOStatement::getColumnMeta might help us with getting a table name, but it won't work with the queries that return result set.
Some kind of regexp for this might might be possible but I very much doubt that is a good solution for this, my queries are big have multiple JOINS etc. the regexp would be very complicated and probably fail fair percentage of time.
Any other ideas?
Simplified scenario:
I have a table with about 100,000 rows.
I will need to pick about 300-400 rows, based on certain criteria, to display them on a web page.
Considering the above scenario, which one of the below approaches will you recommend?
Approach 1: Use just one database query to select the entire table into one big array of 100,000 rows. Using loops, pick required 300-400 rows from the array and pass it one to the front-end. Minimum load on the database server, as it's just one query. Put's more load on the PHP, as it has to store and search through an array of 100,000.
Approach 2: Using a loop, PHP will generate a new query for each row of required data. Collecting all the data will require 300-400 independent queries. More load on the server. Compared to approach 1, lesser load on PHP.
Opinions / thoughts will be appreciated!
100,000 rows is a small amount for MySQL rdbms.
You would better do fine tuning of the db server.
So I recommend neither 1 nor 2.
Just:
SELECT * FROM `your_table` WHERE `any_field` = 'YOUR CRITERIA' LIMIT 300;
When your data overcomes 1,000,000 rows you should think about strong indexes optimization and maybe you'll have to create a stored procedure for complicated select. I assure you it's not PHP work in any case.
As your question asks from Performance prospective, your both approaches would consume some resources. I would still go for approach 1 in this case, as it doesn't make query to database again and again, if you generate query for each row i.e. 300-400 queries. When it comes to huge project designing, database always comes as bottleneck.
To be honest, both approaches are not good. Its good practice to have good database design and query selection. What you are trying to achieve could be done by suitable query.
Using PHP to loop through the data is really a bad idea, after all, a database is designed to perform queries. PHP will need to loop through all the record, and doesn't use an index to speed things up; this is roughly equivalent to a 'table scan' in the database.
In order to get the most performance out of your database, it's important to have a good design and (for example) create indexes on the right columns.
Also, if you haven't decided yet what RDBMS you're going to use, depending on your usage, some databases have more advanced options that can assist in better performance (e.g. PostgreSQL has support for geographical information)
Pease provide some actual data (what kind of data will be stored, what kind of fields) and samples of the kind of queries / filters that will need to be performed so that people will be able to give you an actual answer, not a hypothetical
I have this MySQL table but it takes one query for one comment. If there are 20 comments it makes 20 queries to show the page. Is there any solution? Is it possible to write a MySQL-side function in order to reduce the query size to one?
In addition to storing parent, also store (in a separate column) an id for what item/article the comment was posted on. Then just query for all of the comments with the same item id, and construct the hierarchy after getting the DB rows.
You could look into Joe Celko's 'Nested Set' algorithm.
It provides very efficient 'one-query' retrieval for hierarchical datasets, but there is always a cost, and the cost is that it requires a bit more legwork when you insert into the table.
For high-write activity, I'm not sure I'd go for it personally.
I'd be more likely to just slam it into memcache, and invalidate the cache when someone posts to a specific thread.
Either of these solutions though, is way better than running 20 queries to retrieve 20 comments.
You can do it in a single call from php to mysql if you use a stored procedure. I'd stick to the adjacency list vs. the nested set implementation as you'll only experience more pain using it.
see here: Generating Depth based tree from Hierarchical Data in MySQL (no CTEs)
Hope this helps :)
Apologies in advance if this is a silly question but I'm wondering which might be faster/better in the following simplified scenario...
I've got registered users (in a users table) and I've got countries (in a countries table) roughly as follows:
USERS TABLE:
user_id (PK, INT) | country_id (FK, TINYINT) | other user-related fields...
COUNTRIES TABLE:
country_id (PK, TINYINT) | country_name (VARCHAR) | other country-related fields...
Now, every time I need to display a user's country, I need to do a MySQL join. However, I often need to do lots of other joins with regard to the users and the big picture seems quite "join-heavy".
I'm wondering what the pros & cons might be of taking the countries out of the database and sticking them into a class as an array, from which I could easily retrieve them with public method calls using country_id? Would there be a speed advantage/disadvantage?
Thanks a lot.
EDIT: Thanks for the all the views, very useful. I'll pick the first answer as the accepted solution although all contributions are valued.
Do you have a serious problem performance problem now? I recently went through a performance improvement on a php/mysql website I developed for my company. Certain areas were too slow, and it turned out a lot of fault was with the queries themselves. I used timers to figure out which queries were slow, and I reorganized them (added indexes, etc). In a few cases, it was faster to make two separate queries and join them in php (I had some pretty complicated joins).
Do not try to optimize until you know you have a problem. Figure out if you have a problem first by measuring it, and then if you need to rearrange your queries you will be able to know if you made an improvement.
It would ease stress on your MySQL server to have less JOIN statements, but not significantly so (there aren't that many countries in the world). However, you'll make up that time in the fact that you'll have to implement the JOIN yourself in PHP. And since you're writing it yourself, you will probably write it less efficiently than the SQL statement, which means that it will take more time. I would recommend keeping it in the SQL server, since the advantages of moving it out are so few (and if the PHP instance and the MySQL instance are on the same box, there are not real advantages).
What you suggest should be faster. Granted, the join probably doesn't cost much, but looking it up in a dictionary should be just about free as far as compute power goes.
This is really just a trade off of memory for speed. The only downsides I could see would of course be the increased memory usage to store the country info and the fact that you would have to invalidate that cache if you ever update the countries table (which is probably not very often).
I don't think you'd gain anything from removing the join, as you'd have to iterate over all your result rows and manually lookup the country name, which I doubt would be quicker than MySQL can do.
I also would not consider such an approach for the following reason: If you want to change the name of a country (say you've got a typo), you can do so just by updating a row in the database. But if the names of the countries are in your PHP code, you'd have to redeploy the code in order to make a change. I don't know PHP, but that might not be as straightforard than a DB change in a production system.
So for maintainability reasons, IMHO let the DB do the work.
The general rule in a database world is to NORMALIZED first (results in more tables) and figure performance issues later.
You will want to DENORMALIZED only for simplicity of code, not for performance. Use indexes and stored procedures. DBMS are designed to optimize on joins.
The reason not "normalize as you go" is that you would have to modify the code you already have written most every time you modify the database design.
I'm building a PHP page with data sent from MySQL.
Is it better to have
1 SELECT query with 4 table joins, or
4 small SELECT queries with no table join; I do select from an ID
Which is faster and what is the pro/con of each method? I only need one row from each tables.
You should run a profiling tool if you're truly worried cause it depends on many things and it can vary but as a rule its better to have fewer queries being compiled and fewer round trips to the database.
Make sure you filter things as well as you can using your where and join on clauses.
But honestly, it usually doesn't matter since you're probably not going to be hit all that hard compared to what the database can do, so unless optimization is your spec you should not do it prematurely and do whats simplest.
Generally, it's better to have one SELECT statement. One of the main reasons to have databases is that they are fast at processing information, particularly if it is in the format of query.
If there is any drawback to this approach, it's that there are some kinds of analysis that you can't do with one big SELECT statement. RDBMS purists will insist that this is a database design problem, in which case you are back to my original suggestion.
When you use JOINs instead of multiple queries, you allow the database to apply its optimizations. You also are potentially retrieving rows that you don't need (if you were to replace an INNER join with multiple selects), which increases the network traffic between your app server and database server. Even if they're on the same box, this matters.
It might depend on what you do with the data after you fetch it from the DB. If you use each of the four results independently, then it would be more logical and clear to have four separate SELECT statements. On the other hand, if you use all the data together, like to create a unified row in a table or something, then I would go with the single SELECT and JOINs.
I've done a bit of PHP/MySQL work, and I find that even for queries on huge tables with tons of JOINs, the database is pretty good at optimizing - if you have smart indexes. So if you are serious about performance, start reading up on query optimization and indexing.
I would say 1 query with the join. This way you need to hit the server only once. And if your tables are joined with indexes, it should be fast.
Well under Oracle you'd want to take advantage of the query caching, and if you have a lot of small queries you are doing in your sequential processing, it would suck if the last query pushed the first one out of the cache...just in time for you to loop around and run that first query again (with different parameter values obviously) on the next pass.
We were building an XML output file using Java stored procedures and definitely found the round trip times for each individual query were eating us alive. We found it was much faster to get all the data in as few queries as possible, then plug those values into the XML DOM as needed.
The only downside is that the Java code was a bit less elegant, as the data fetch was now remote from its usage. But we had to generate a large complex XML file in as close to zero time as possible, so we had to optimize for speed.
Be careful when dealing with a merge table however. It has been my experience that although a single join can be good in most situations, when merge tables are involved you can run into strange situations.