I've spent a few days making a script that essentially takes some data on a db2 system and creates new records for it in a mysql system. Along the way, it does some checks and I use looping to base inserts or updates on those conditions.
This script works, it returns what I expect and inserts/updates as expected. I've tested it for the deletion process, updates, inserts, whether records are expired or not......basically every function of this script I have tested thoroughly.
I feel like it's not as fast as it could be, it's probably more sequential and redundant than it should be as well.
I'm not used to working with PDO in a script like this, but I'm wondering what I could do here to fix performance/speed and redundancy. I know some of the logic may 'seem' redundant, but the logic is exactly where it needs to be, I'm just wondering if I could/should use functions to reduce calls or loops possibly.
Any help or advice is greatly appreciated.
this is tricky, I will get dragged over the hot coals for this, 1) you may want to not use PDO at all if you are sure that there is no risk of direct injection attack, 2) also I would look at implementing this sort of feature via PHP CLI i.e. via acron job on a curl trigger
In general PDO is a very expensive way to run a query. A simple string under mysqli_query() would be about 2.5x more efficient.
Related
This seems like a pretty basic question but one I don't know the answer to.
I wrote a script in PHP that loops through some data and then performs an UPDATE to records in our database. There are roughly some 150,000 records, so the script certainly takes a while to complete.
Could I potentially harm or interfere with the data insertion if I run a basic SELECT statement?
Say...I want to ensure that the script is working properly so if I run a basic SELECT COUNT() to see if it's increasing in real time as the script runs. Is this possible or would it screw something up?
Thank you!
Generally a SELECT call is incapable of "causing harm" provided you're not talking about SQL injection problems.
The InnoDB engine, which you should be using, has what's called Multi-Version Concurrency Control or MVCC for short. It means that until your UPDATE statement is finished, or the transaction that the statement is a part of, the SELECT will be done against the last consistent database state.
If you're using MyISAM, which is a very bad idea in most production environments due to the limitations of that engine and the way the data is stored without a rollback journal, the SELECT call will probably block until the UPDATE is applied since it does not support MVCC.
So i have a site that I am working on that has been touched by many developers over time and as new features arised the developers at the time felt it necessary to just add another query to get the data that is needed. Which leaves me with a php page that is slow and runs maybe 70 queries. Some of the queries are in the actual PHP file for this page and some are scattered throughout many different function. Now i have the responsibility of trying to speed up the page to meet certain requirements. I am seeking the best course of action.
Is there a way to print all the queries that are running other then going through the file and finding each and every one?
should I cache the queries that are slow using memcached?
Is there an idea that anyone has had to help me speed up the page?
Is there a plugin or tool to analyze the queries, I am using YSlow and there is nothing there to look at queries?
Something I do is to have a my_mysql_query(...) function that behaves as mysql_query(...) but which I can then tailor to log out the execution time together with the text of the query. MySQL can log slow queries with very little fiddling - see here.
If there is not a central query method that is called to run each query, then the only options is to look for each query and find where it is in the code. Otherwise you could go to that query function and print each query that runs through it.
Using cache will depend on how often the data changes. If it changes frequently, it may not give you any performance boost to cache it.
One idea to help you speed up the page is to do the following:
group like queries into the same query and use the data in multiple parts
consider breaking the page into multiple locations
Depending on the database you are using. There are analyze functions in some databases that will help you optimize your queries. For example, you can use EXPLAIN with mysql. (http://dev.mysql.com/doc/refman/5.0/en/explain.html) You may need to consider consulting with a DBA on the issue.
Good luck.
I'm having somewhat theoretical question: I'm designing my own CMS/app-framework (as many PHP programmers on various levels did before... and always will) to either make production-ready solution or develop various modules/plugins that I'll use later.
Anyway, I'm thinking on gathering SQL connections from whole app and then run them on one place:
index.php:
<?php
include ('latestposts.php');
include ('sidebar.php');
?>
latestposts.php:
<?php
function gather_data ($arg){ $sql= ""; }
function draw ($data) {...}
?>
sidebar.php:
<?php
function gather_data ($arg){ $sql= ""; }
function draw ($data) {...}
?>
Now, while whole module system application is yet-to-be-figured, it's idea is already floating somewhere in my brain. However, I'm thinking, if I'm able to first load all gather_data functions, then run sql and then run draw functions - and if I'm able to reuse results!
If, in example, $sql is SELECT * FROM POSTS LIMIT 10 and $sql2 is SELECT * FROM POSTS LIMIT 5, is it possible to program PHP to see: "ah, it's the same SQL, I'll call it just once and reuse the first 5 rows"?
Or is it possible to add this behavior to some DRM?
However, as tags say, this is still just an idea in progress. If it proves to be easy to accomplish, then I will post more question how :)
So, basically: Is it possible, does it make sense? If both are yes, then... any ideas how?
Don't get me wrong, that sounds like a plausible idea and you can probably get it running. But I wonder if it is really going to be beneficial. Will it cause a system to be faster? Give you more control? Make development easier?
I would just look into using (or building) a system using well practiced MVC style coding standards, build a good DB structure, and tweak the heck out of Apache (or use something like Lighttpd). You will have a lot more widespread acceptance of your code if you ever decide to make it open source, and if you ever need a hand with it another developer could step right in and pick up the keyboard.
Also, check out query caching in MySQL--you will see a similar (though not one-to-one) benefit from caching your query results server side with regard to your query example. Even better that is stored in server memory so PHP/MySQL overhead is dropped AND you don't have to code it.
All of that aside, I do think it is possible. =)
Generally speaking, such a cache system can generate significant time savings, but at the cost of memory and complexity. The more results you want to keep, the more memory it will take; and there's no guarantee that your results will ever be used again, particularly the larger result sets.
Second, there are certain queries that should never be cached, or that should be run again even if they're in the cache. For the most part, only SELECT and SHOW queries can be cached effectively, but you need to worry about invalidating them when you modify the underlying data. Even in the same pageview, you might find yourself working around your own cache system on occasion.
Third, this kind of problem has already been solved several times. First, consider turning on the MySQL query cache. Most of the time, it will speed things up a bit without requiring any code changes on your end. However, it's a bit aggressive about invalidating entries, so you could gain some performance at a higher level.
If you need another level, consider memcached. You'll have to store and invalidate entries manually, but it can store results across page views (where you'll really find the performance benefit), and will let unused entries expire before running out of memory.
I've seen this question around the internet (here and here, for example), but I've never seen a good answer. Is it possible to find the length of time a given MySQL query (executed via mysql_query) took via PHP?
Some places recommend using php's microtime function, but this seems like it may be inaccurate. The mysql_query may be bogged down by network latency, or a sluggish system which isn't responding to your query quickly, or some other unrelated cause. None of these are directly related to the quality of your query, which is the only thing I really want to test out here. (Please mention in the comments if you disagree!)
My answer is similar, but varied. Record the time before and after the query, but do it within your database query class. Oh, you say you are using mysql_query directly? Well, now you know why you should use a class wrapper around those raw php database functions (pardon the snark). Actually, one is already built called PDO:
http://us2.php.net/pdo
If you want to extend the functionality to do timing around each of your queries... extend the class! Simple enough, right?
I you are only checking the quality of the query itself, then remove PHP from the equation. Use a tool like the MySQL Query Browser or SQLyog.
Or if you have shell access, just connect directly. Any of these methods will be superior in determining the actual performance of your queries.
At the php level you pretty much would need to record the time before and after the query.
If you only care about the query performance itself you can enable the slow query log in your mysql server: http://dev.mysql.com/doc/refman/5.0/en/slow-query-log.html That will log all queries longer than a specified number of seconds.
If you really need query information maybe you could make use of SHOW PROFILES:
http://dev.mysql.com/doc/refman/5.0/en/show-profiles.html
Personally, I would use a combination of microtime-ing, the slow query log, mytop, and analyzing problem queries with the MySQL client (command line).
Okay, so I'm sure plenty of you have built crazy database intensive pages...
I am building a page that I'd like to pull all sorts of unrelated database information from. Here are some sample different queries for this one page:
article content and info
IF the author is a registered user, their info
UPDATE the article's view counter
retrieve comments on the article
retrieve information for the authors of the comments
if the reader of the article is signed in, query for info on them
etc...
I know these are basically going to be pretty lightning quick, and that I could combine some; but I wanted to make sure that this isn't abnormal?
How many fairly normal and un-heavy queries would you limit yourself to on a page?
As many as needed, but not more.
Really: don't worry about optimization (right now). Build it first, measure performance second, and IFF there is a performance problem somewhere, then start with optimization.
Otherwise, you risk spending a lot of time on optimizing something that doesn't need optimization.
I've had pages with 50 queries on them without a problem. A fast query to a non-large (ie, fits in main memory) table can happen in 1 millisecond or less, so you can do quite a few of those.
If a page loads in less than 200 ms, you will have a snappy site. A big chunk of that is being used by latency between your server and the browser, so I like to aim for < 100ms of time spent on the server. Do as many queries as you want in that time period.
The big bottleneck is probably going to be the amount of time you have to spend on the project, so optimize for that first :) Optimize the code later, if you have to. That being said, if you are going to write any code related to this problem, write something that makes it obvious how long your queries are taking. That way you can at least find out you have a problem.
I don't think there is any one correct answer to this. I'd say as long as the queries are fast, and the page follows a logical flow, there shouldn't be any arbitrary cap imposed on them. I've seen pages fly with a dozen queries, and I've seen them crawl with one.
Every query requires a round-trip to your database server, so the cost of many queries grows larger with the latency to it.
If it runs on the same host there will still be a slight speed penalty, not only because a socket is between your application but also because the server has to parse your query, build the response, check access and whatever else overhead you got with SQL servers.
So in general it's better to have less queries.
You should try to do as much as possible in SQL, though: don't get stuff as input for some algorithm in your client language when the same algorithm could be implemented without hassle in SQL itself. This will not only reduce the number of your queries but also help a great deal in selecting only the rows you need.
Piskvor's answer still applies in any case.
Wordpress, for instance, can pull up to 30 queries a page. There are several things you can use to stop MySQL pull down - one of them being memchache - but right now and, as you say, if it will be straightforward just make sure all data you pull is properly indexed in MySQL and don't worry much about the number of queries.
If you're using a Framework (CodeIgniter for example) you can generally pull data for the page creation times and check whats pulling your site down.
As other have said, there is no single number. Whenever possible please use SQL for what it was built for and retrieve sets of data together.
Generally an indication that you may be doing something wrong is when you have a SQL inside a loop.
When possible Use joins to retrieve data that belongs together versus sending several statements.
Always try to make sure your statements retrieve exactly what you need with no extra fields/rows.
If you need the queries, you should just use them.
What I always try to do, is to have them executed all at once at the same place, so that there is no need for different parts (if they're separated...) of the page to make database connections. I figure it´s more efficient to store everything in variables than have every part of a page connect to the database.
In my experience, it is better to make two queries and post-process the results than to make one that takes ten times longer to run that you don't have to post-process. That said, it is also better to not repeat queries if you already have the result, and there are many different ways this can be achieved.
But all of that is oriented around performance optimization. So unless you really know what you're doing (hint: most people in this situation don't), just make the queries you need for the data you need and refactor it later.
I think that you should be limiting yourself to as few queries as possible. Try and combine queries to mutlitask and save time.
Premature optimisation is a problem like people have mentioned before, but that's where you're crapping up your code to make it run 'fast'. But people take this 'maxim' too far.
If you want to design with scalability in mind, just make sure whatever you do to load data is sufficiently abstracted and calls are centralized, this will make it easier when you need to implement a shared memory cache, as you'll only have to change a few things in a few places.