I have used MySQL a lot, but I always wondered exactly how does it work - when I get a positive result, where is the data stored exactly? For example, I write like this:
$sql = "SELECT * FROM TABLE";
$result = mysql_query($sql);
while ($row = mysql_fetch_object($result)) {
echo $row->column_name;
}
When a result is returned, I am assuming it's holding all the data results or does it return in a fragment and only returns where it is asked for, like $row->column_name?
Or does it really return every single row of data even if you only wanted one column in $result?
Also, if I paginate using LIMIT, does it hold THAT original (old) result even if the database is updated?
The details are implementation dependent but generally speaking, results are buffered. Executing a query against a database will return some result set. If it's sufficiently small all the results may be returned with the initial call or some might be and more results are returned as you iterate over the result object.
Think of the sequence this way:
You open a connection to the database;
There is possibly a second call to select a database or it might be done as part of (1);
That authentication and connection step is (at least) one round trip to the server (ignoring persistent connections);
You execute a query on the client;
That query is sent to the server;
The server has to determine how to execute the query;
If the server has previously executed the query the execution plan may still be in the query cache. If not a new plan must be created;
The server executes the query as given and returns a result to the client;
That result will contain some buffer of rows that is implementation dependent. It might be 100 rows or more or less. All columns are returned for each row;
As you fetch more rows eventually the client will ask the server for more rows. This may be when the client runs out or it may be done preemptively. Again this is implementation dependent.
The idea of all this is to minimize roundtrips to the server without sending back too much unnecessary data, which is why if you ask for a million rows you won't get them all back at once.
LIMIT clauses--or any clause in fact--will modify the result set.
Lastly, (7) is important because SELECT * FROM table WHERE a = 'foo' and SELECT * FROM table WHERE a = 'bar' are two different queries as far as the database optimizer is concerned so an execution plan must be determined for each separately. But a parameterized query (SELECT * FROM table WHERE a = :param) with different parameters is one query and only needs to be planned once (at least until it falls out of the query cache).
I think you are confusing the two types of variables you're dealing with, and neither answer really clarifies that so far.
$result is a MySQL result object. It does not "contain any rows." When you say $result = mysql_query($sql), MySQL executes the query, and knows what rows will match, but the data has not been transferred over to the PHP side. $result can be thought of as a pointer to a query that you asked MySQL to execute.
When you say $row = mysql_fetch_object($result), that's when PHP's MySQL interface retrieves a row for you. Only that row is put into $row (as a plain old PHP object, but you can use a different fetch function to ask for an associative array, or specific column(s) from each row.)
Rows may be buffered with the expectation that you will be retrieving all of the rows in a tight loop (which is usually the case), but in general, rows are retrieved when you ask for them with one of the mysql_fetch_* functions.
If you only want one column from the database, then you should SELECT that_column FROM .... Using a LIMIT clause is also a good idea whenever possible, because MySQL can usually perform significant optimizations if it knows that you only want a certain group of rows.
The first question can be answered by reading up on resources
Since you are SELECTing "*", every column is returned for each mysql_fetch_object call. Just look at print_r($row) to see.
In simple words the resource returned it like an ID that the MySQL library associate with other data. I think it is like the identification card in your wallet, it's just a number and some information but asociated with a lot of more information if you give it to the goverment, or your cell-phone company, etc.
Related
In a site I maintain I have a need to query the same table (articles) twice, once for each category of article. AFAIT there are basically two ways of doing this (maybe someone can suggest a better, third way?):
Perform the db query twice, meaning the db server has to sort through the entire table twice. After each query, I iterate over the cursor to generate html for a list entry on the page.
Perform the query just once and pull out all the records, then sort them into two separate arrays. After this, I have to iterate over each array separately in order to generate the HTML.
So it's this:
$newsQuery = $mysqli->query("SELECT * FROM articles WHERE type='news' ");
while($newRow = $newsQuery->fetch_assoc()){
// generate article summary in html
}
// repeat for informational articles
vs this:
$query = $mysqli->query("SELECT * FROM articles ");
$news = Array();
$info = Array();
while($row = $query->fetch_assoc()){
if($row['type'] == "news"){
$news[] = $row;
}else{
$info[] = $row;
}
}
// iterate over each array separate to generate article summaries
The recordset is not very large, current <200 and will probably grow to 1000-2000. Is there a significant different in the times between the two approaches, and if so, which one is faster?
(I know this whole thing seems awfully inefficient, but it's a poorly coded site I inherited and have to take care of without a budget for refactoring the whole thing...)
I'm writing in PHP, no framework :( , on a MySql db.
Edit
I just realized I left out one major detail. On a given page in the site, we will display (and thus retrieve from the db) no more than 30 records at once - but here's the catch: 15 info articles, and 15 news articles. On each page we pull the next 15 of each kind.
You know you can sort in the DB right?
SELECT * FROM articles ORDER BY type
EDIT
Due to the change made to the question, I'm updating my answer to address the newly revealed requirement: 15 rows for 'news' and 15 rows for not-'news'.
The gist of the question is the same "which is faster... one query to two separate queries". The gist of the answer remains the same: each database roundtrip incurs overhead (extra time, especially over a network connection to a separate database server), so with all else being equal, reducing the number database roundtrips can improve performance.
The new requirement really doesn't impact that. What the newly revealed requirement really impacts is the actual query to return the specified resultset.
For example:
( SELECT n.*
FROM articles n
WHERE n.type='news'
LIMIT 15
)
UNION ALL
( SELECT o.*
FROM articles o
WHERE NOT (o.type<=>'news')
LIMIT 15
)
Running that statement as a single query is going to require fewer database resources, and be faster than running two separate statements, and retrieving two disparate resultsets.
We weren't provided any indication of what the other values for type can be, so the statement offered here simply addresses two general categories of rows: rows that have type='news', and all other rows that have some other value for type.
That query assumes that type allows for NULL values, and we want to return rows that have a NULL for type. If that's not the case, we can adjust the predicate to be just
WHERE o.type <> 'news'
Or, if there are specific values for type we're interested in, we can specify that in the predicate instead
WHERE o.type IN ('alert','info','weather')
If "paging" is a requirement... "next 15", the typical pattern we see applied, LIMIT 30,15 can be inefficient. But this question isn't asking about improving efficiency of "paging" queries, it's asking whether running a single statement or running two separate statements is faster.
And the answer to that question is still the same.
ORIGINAL ANSWER below
There's overhead for every database roundtrip. In terms of database performance, for small sets (like you describe) you're better off with a single database query.
The downside is that you're fetching all of those rows and materializing an array. (But, that looks like that's the approach you're using in either case.)
Given the choice between the two options you've shown, go with the single query. That's going to be faster.
As far as a different approach, it really depends on what you are doing with those arrays.
You could actually have the database return the rows in a specified sequence, using an ORDER BY clause.
To get all of the 'news' rows first, followed by everything that isn't 'news', you could
ORDER BY type<=>'news' DESC
That's MySQL short hand for the more ANSI standards compliant:
ORDER BY CASE WHEN t.type = 'news' THEN 1 ELSE 0 END DESC
Rather than fetch every single row and store it in an array, you could just fetch from the cursor as you output each row, e.g.
while($row = $query->fetch_assoc()) {
echo "<br>Title: " . htmlspecialchars($row['title']);
echo "<br>byline: " . htmlspecialchars($row['byline']);
echo "<hr>";
}
Best way of dealing with a situation like this is to test this for yourself. Doesn't matter how many records do you have at the moment. You can simulate whatever amount you'd like, that's never a problem. Also, 1000-2000 is really a small set of data.
I somewhat don't understand why you'd have to iterate over all the records twice. You should never retrieve all the records in a query either way, but only a small subset you need to be working with. In a typical site where you manage articles it's usually about 10 records per page MAX. No user will ever go through 2000 articles in a way you'd have to pull all the records at once. Utilize paging and smart querying.
// iterate over each array separate to generate article summaries
Not really what you mean by this, but something tells me this data should be stored in the database as well. I really hope you're not generating article excerpts on the fly for every page hit.
It all sounds to me more like a bad architecture design than anything else...
PS: I believe sorting/ordering/filtering of a database data should be done on the database server, not in the application itself. You may save some traffic by doing a single query, but it won't help much if you transfer too much data at once, that you won't be using anyway.
I'm trying to figure out the most efficient way to send multiple queries to a MySQL database with PHP. Right now I'm doing two separate queries but I know there are more efficient methods, like using mysqli_multi_query. Is mysqli_multi_query the most efficient method or are there other means?
For example, I could just write a query that puts ALL the data from ALL the tables in the database into a PHP array. Then I could sort the data using PHP, resulting in having only one query no matter what data I needed... and I could put that PHP array into a session variable so the user would never query the database again during that session. Makes sense right? Why not just do that rather than create a new query each time the page is reloaded?
It's really difficult to find resources on this so I'm just looking for advice. I plan to have massive traffic on the site that I am building so I need the code to put as little stress on the server as possible. As far as table size is concerned, we're talking about, let's say 3,000 rows in the largest table. Is it feasible to store that into one big PHP array (advantage being the client would query the database only ONCE on page load)?
$Table1Array = Array();
$Table1_result = mysqli_query($con,"SELECT * FROM Table1 WHERE column1 ='" . $somevariable . "'");
while($row = mysqli_fetch_array($Table1_result))
{
$Table1Array[] = $row;
}
// query 2
$Table2Array = Array();
$Table2_result = mysqli_query($con,"SELECT * FROM Table2 LIMIT 5");
while($row = mysqli_fetch_array($Table2_result))
{
$Table2Array[] = $row;
}
There are a few things to address here, hopefully this will make sense / be constructive...
Is mysqli_multi_query the most efficient method or are there other
means?
It depends on the specifics of what you are trying to do for a given page / query. Generally speaking though, using mysql_multi_query won't gain you much performance, as MySQL will still execute the queries you give it one after the other. mysql_multi_query's performance gains come from the fact that fewer round trips are made between PHP and MySQL. A good thing if the two are on different servers, or you are performing 1000s of queries one after the other.
For example, I could just write a query that puts ALL the data from
ALL the tables in the database into a PHP array.
Just. No. In theory you could, but unless you had one page that displayed all of the database contents at once, there would simply be no need.
Then I could sort the data using PHP
If you can sort / filter the data into the correct form using MySQL, do that. Manipulating datasets is one of the things MySQL is very good at.
Why not just [load everything into the session] rather than create a new query each time the page is reloaded?
Because the dataset would be huge, and that session data would be transferred from the client every time they made a request to your server. Apart from sending needlessly huge requests, what about the other challenges this approach would raise? I.e. What would you do if extra data had been added to the db since you created the session-based cache for this particular user? What if the size of the data got too big for a user's session? What experience would I have as a user if I denied your session cookie and thereby forced the monster query to execute on every request?
I plan to have massive traffic on the site that I am building
Don't we all! As the comments above suggest, premature optimization is a Bad Thing. At this stage you should concentrate on getting your domain logic nailed down and building a good, maintainable OO platform on which to base further development.
If i wanted to execute multiple queries on a mysql database i would use mysql stored procedures and then all u have to do is issue a simple call from php, a basic example of a procedure would be:
DELIMITER $$
create procedure multiple_queries()
Begin
SELECT * FROM TBL1 WHERE 1;
SELECT * FROM TBL2 WHERE 2;
SELECT * FROM TBL3 LEFT JOIN ON TBL4 WHERE id= '121';
END $$
DELIMITER ;
and in php you simple call the procedure and any parameter associated with it in the parenthesis
CALL multiple_queries()
Why not use the DB engine as much as possible, its well capable of handling complex solutions and we dont utilize it.
For example, I could just write a query that puts ALL the data from ALL the tables in
the database into a PHP array. Then I could sort the data using PHP, resulting in having
only one query no matter what data I needed...
I would think this would be inefficient since you've lost the value of the Database. When you consider optimization, mysql is superior to any php code that you could write.
Additionally, you're saying that running one query, pushing the data into a variable for the users may decrease resources but is that really true? If you have massive traffic, and this data are in session variables, then if 1000 users are currently logged on then you will have 1000 duplications of the entire Database on your PHP server! - you sure the server has enough memory for this?
There are 2 ways I use to run multiple queries:
$conn = mysql_connect("host", "dbuser", "password");
$query1 = "select.......";
$result1 = mysql_query($query1) or die (mysql_error()); // execute the query
while($row1 = mysql_fetch_assoc($result1))
{
// fetch the results from the query
}
$query2 = "select.......";
$result2 = mysql_query($query2) or die (mysql_error()); // execute the query
while($row2 = mysql_fetch_assoc($result2))
{
// fetch the results from the query i.e. $row2['']
}
mysql_close($conn); // Close the Database connection.
The other way is to employ the use of transactions if there are more than one queries which must be either all executed or none at all
You could try it. But if the only reason is to have 1 query thinking that it will be faster, I would think otherwise. Optimizations in Databases are supreme especially mysql
I have a database design here that looks this in simplified version:
Table building:
id
attribute1
attribute2
Data in there is like:
(1, 1, 1)
(2, 1, 2)
(3, 5, 4)
And the tables, attribute1_values and attribute2_values, structured as:
id
value
Which contains information like:
(1, "Textual description of option 1")
(2, "Textual description of option 2")
...
(6, "Textual description of option 6")
I am unsure whether this is the best setup or not, but it is done as such per requirements of my project manager. It definitely has some truth in it as you can modify the text easily now without messing op the id's.
However now I have come to a page where I need to list the attributes, so how do I go about there? I see two major options:
1) Make one big query which gathers all values from building and at the same time picks the correct textual representation from the attribute{x}_values table.
2) Make a small query that gathers all values from the building table. Then after that get the textual representation of each attribute one at a time.
What is the best option to pick? Is option 1 even faster as option 2 at all? If so, is it worth the extra trouble concerning maintenance?
Another suggestion would be to create a view on the server with only the data you need and query from that. That would keep the work on the server end, and you can pull just what you need each time.
If you have a small number of rows in attributes table, then I suggest to fetch them first, fetch all of them! store them into some array using id as index key in array.
Then you can proceed with building data, now you just have to use respective array to look for attribute value
I would recommend something in-between. Parse the result from the first table in php, and figure out how many attributes you need to select from each attribute[x]_values table.
You can then select attributes in bulk using one query per table, rather than one query per attribute, or one query per building.
Here is a PHP solution:
$query = "SELECT * FROM building";
$result = mysqli_query(connection,$query);
$query = "SELECT * FROM attribute1_values";
$result2 = mysqli_query(connection,$query);
$query = "SELECT * FROM attribute2_values";
$result3 = mysqli_query(connection,$query);
$n = mysqli_num_rows($result);
for($i = 1; $n <= $i; $i++) {
$row = mysqli_fetch_array($result);
mysqli_data_seek($result2,$row['attribute1']-1);
$row2 = mysqli_fetch_array($result2);
$row2['value'] //Use this as the value for attribute one of this object.
mysqli_data_seek($result3,$row['attribute2']-1);
$row3 = mysqli_fetch_array($result3);
$row3['value'] //Use this as the value for attribute one of this object.
}
Keep in mind that this solution requires that the tables attribute1_values and attribute2_values start at 1 and increase by 1 every single row.
Oracle / Postgres / MySql DBA here:
Running a query many times has quite a bit of overhead. There are multiple round trips to the db, and if it's on a remote server, this can add up. The DB will likely have to parse the same query multiple times in MySql which will be terribly inefficient if there are tons of rows. Now, one thing that your PHP method (multiple queries) has as an advantage is that it'll use less memory as it'll release the results as they're no longer needed (if you run the query as a nested loop that is, but if you query all the results up front, you'll have a lot of memory overhead, depending on the table sizes).
The optimal result would be to run it as 1 query, and fetch the results 1 at a time, displaying each one as needed and discarding it, which can reek havoc with MVC frameworks unless you're either comfortable running model code in your view, or run small view fragments.
Your question is very generic and i think that to get an answer you should give more hints to how this page will look like and how big the dataset is.
You will get all the buildings with theyr attributes or just one at time?
Cause your data structure look like very simple and anything more than a raspberrypi can handle it very good.
If you need one record at time you don't need any special technique, just JOIN the tables.
If you need to list all buildings and you want to save db time you have to measure your data.
If you have more attribute than buildings you have to choose one way, if you have 8 attributes and 2000 buildings you can think of caching attributes in an array with a select for each table and then just print them using the array. I don't think you will see any speed drop or improvement with so simple tables on a modern computer.
$att1[1]='description1'
$att1[2]='description2'
....
Never do one at a time queries, try to combine them into a single one.
MySQL will cache your query and it will run much faster. PhP loops are faster than doing many requests to the database.
The query cache stores the text of a SELECT statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again.
http://dev.mysql.com/doc/refman/5.1/en/query-cache.html
In my program I launch an SQL query and get back a result resource. I then iterate through the rows of this result resource using the mysql_fetch_array() function and use the contents of the fields of each row to construct a further SQL query.
The result of launching this second query is the first set of results that I want. However, because the number of results produced by doing this is not many I want to make the search less specific by dropping the last record used to make the query.
e.g. the query which produces the first set of results I want could be:
SELECT uid FROM users WHERE (gender=male AND relationship_status=single
AND shoe_size=10)
I would then want to drop the last record so that my query became:
SELECT uid FROM users WHERE (gender=male AND relationship_status=single)
I have already written code to produce the first query but as I mentioned above I use the mysql_fetch_array function to iterate through ALL of the records. In subsequent "rounds" I only want to iterate through successively less records so that my query is less specific. How can I do this?
This seems like an very inefficient method too - so I'm welcome to any simple ideas which might make it more efficient.
EDIT: Thanks for the reply - Yeah I am actually doing this in my program. I am basically trying to implement a basic search algorithm by taking all the preferences a user has specified in the DB and using it to form a query to look for people with those preferences. So the first time search using all the criteria, then on successive attempts search using one less criteria and negate the user ids which were previously returned. At the moment I am constructing the query from scratch for each "round", but I want to find a way I can do this using the last query
Using the queries above, you could do:
SELECT uid
FROM users
WHERE uid NOT IN (
SELECT uid
FROM users
WHERE
(gender=male
AND relationship_status=single
AND shoe_size=10)
)
This will essentially turn your first query into a sub-query, and use that to negate the results returned. Ie, it will return all the rows, NOT IN the first query.
I want to use pdo in my application, but before that I want to understand how internally
PDOStatement->fetch and PDOStatement->fetchAll.
For my application, I want to do something like "SELECT * FROM myTable" and insert into csv file and it has around 90000 rows of data.
My question is, if I use PDOStatement->fetch as I am using it here:
// First, prepare the statement, using placeholders
$query = "SELECT * FROM tableName";
$stmt = $this->connection->prepare($query);
// Execute the statement
$stmt->execute();
var_dump($stmt->fetch(PDO::FETCH_ASSOC));
while ($row = $stmt->fetch(PDO::FETCH_ASSOC))
{
echo "Hi";
// Export every row to a file
fputcsv($data, $row);
}
Will after every fetch from database, result for that fetch would be store in memory ?
Meaning when I do second fetch, memory would have data of first fetch as well as data for second fetch.
And so if I have 90000 rows of data and if am doing fetch every time than memory is being updated to take new fetch result without removing results from previous fetch and so for the last fetch memory would already have 89999 rows of data.
Is this how PDOStatement::fetch
works ?
Performance wise how does this stack
up against PDOStatement::fetchAll ?
Update: Something about fetch and fetchAll from memory usage point of view
Just wanted to added some thing to this question as recently found something regarding fetch and fetchAll, hope this would make this question worthwhile for people would visit this question in future to get some understanding on fetch and fetchAll parameters.
fetch does not store information in memory and it works on row to row basis, so it would go through the result set and return row 1, than again would go to the result set and than again return row 2 mind here that it will not return row 1 as well as 2 but would only return row 2, so fetch will not store anything into memory but fetchAll will store details into the memories. So fetch is better option compared to fetchAll if we are dealing with an resultant set of around 100K in size.
PHP generally keeps its results on the server. It all depends on the driver. MySQL can be used in an "unbuffered" mode, but it's a tad tricky to use. fetchAll() on a large result set can cause network flooding, memory exhaustion, etc.
In every case where I need to process more than 1,000 rows, I'm not using PHP. Consider also if your database engine already has a CSV export operation. Many do.
I advice you to use PDO::FETCH_LAZY instead of PDO::FETCH_ASSOC for big data.
I used it for export to csv row by row and it works fine.
Without any "out of memory" errors.