I can't find a clear answer for this anywhere in MySQL documentation.
When I run a query, something like:
Code Block 1
$stmt = $db->prepare('SELECT id, name FROM table WHERE status=1');
does the search start at the beginning of the table, at row 0 (or the lowest available row)?
What I'm trying to do is go through a table one row at a time, and then exit when I get to the end:
Code Block 2
$curRow = 0;
while(true){
$stmt = $db->prepare('SELECT id, name FROM table WHERE status=? AND id>? LIMIT 1');
$stmt->execute(array(0, $curRow));
$result = $stmt->fetchAll();
if(count($result)){
$curRow = $result[0]['id'];
$stmt2 = $db->prepare('UPDATE table SET status=? WHERE id=?');
$stmt2->execute(array(1, $curRow));
... do some other stuff ...
}else{
exit();
}
}
And so far, in testing, this has worked exactly as intended. But will it always be so?
Possible erroneous case:
Start out with the following table:
table
id | name | status
-- | ---- | ------
1 | ... | 0
2 | ... | 0
3 | ... | 0
4 | ... | 0
5 | ... | 0
6 | ... | 0
And run the query in Code Block 2. Say it starts at the first row, so now we have $curRow=1, and the table looks as follows:
table
id | name | status
-- | ---- | ------
1 | ... | 1
2 | ... | 0
3 | ... | 0
4 | ... | 0
5 | ... | 0
6 | ... | 0
All is well. The code does whatever it needs to, and then continues with the loop. Any of the remaining rows will satisfy the conditions in $stmt (i.e. status=0 and id>$curRow).
Will the statement always look at consecutive rows when checking the conditions? If not, it could end up at any arbitrary row, say the third:
table
id | name | status
-- | ---- | ------
1 | ... | 1
2 | ... | 0
3 | ... | 1
4 | ... | 0
5 | ... | 0
6 | ... | 0
And now we have $curRow=3, which means the query will never go back and look at the second row.
I know it's tricky business speaking in absolutes (always, never, every time, ...), but is there a way to ensure that the query begins at the lowest available row? Or does MySQL handle this automatically?
There is no guarantee for any reliable order when you do not explicitly set it to a key. It might seem to be ordered for now, but over time, with more data, maybe with more servers, partitioned data, union'ed data, it will change quickly to something unexpected.
Better use ORDER BY:
$stmt = $db->prepare('SELECT id, name FROM table WHERE status=1 ORDER BY ID ASC');
Make sure you have an index on the column you want to order by, it will speed up things!
You should not write such a code which has such assumptions of your database. Your code might get less maintainable, harder to debug when some change comes to your database and that'll be a total headache for you. You should think of other mechanisms / workarounds to get the job done which is also more professional.
You might want to add a column which will provide that they can be ordered. For example date, ID, whatsoever.
Look up ORDER BY clause.
Related
I have a php file that will update a table in MySQL. It will update all the done flags from 0 to 1 after a job has completed. I need to query the done=0 starting from the lowest primary key (ID). After the job is done, I update the done=1 and move on to the next row. I have the following table :
--------------------
| ID | test | done |
--------------------
| 1 | test1| 0 |
--------------------
| 2 | test2| 1 |
--------------------
| 3 | test3| 0 |
--------------------
| 4 | test4| 0 |
--------------------
| 5 | test5| 1 |
--------------------
When I do the following query SELECT test FROM mytable WHERE done=0 ORDER BY id ASC it gives me all the test that have done flags that are 0, however, I want to start with the first and handle that first, then move on to the next one and so on. So I need a query that will show me just the first row. How can I do this?
Your query is on the right track, since it already is sorting by ascending id. All you need to do is limit it to only returning the first result, if one exists. Just add LIMIT 1 to the end of the query:
SELECT test FROM mytable WHERE done=0 ORDER BY id ASC LIMIT 1
I have a web app in which I show a series of posts based on this table schema (there are thousands of rows like this and other columns too (removed as not required for this question)) :-
+---------+----------+----------+
| ID | COL1 | COL2 |
+---------+----------+----------+
| 1 | NULL | ---- |
| 2 | --- | NULL |
| 3 | NULL | ---- |
| 4 | --- | NULL |
| 5 | NULL | NULL |
| 6 | --- | NULL |
| 7 | NULL | ---- |
| 8 | --- | NULL |
+---------+----------+----------+
And I use this query :-
SELECT * from `TABLE` WHERE `COL1` IS NOT NULL AND `COL2` IS NULL ORDER BY `COL1`;
And the resultant result set I get is like:-
+---------+----------+----------+
| ID | COL1 | COL2 |
+---------+----------+----------+
| 12 | --- | NULL |
| 1 | --- | NULL |
| 6 | --- | NULL |
| 8 | --- | NULL |
| 11 | --- | NULL |
| 13 | --- | NULL |
| 5 | --- | NULL |
| 9 | --- | NULL |
| 17 | --- | NULL |
| 21 | --- | NULL |
| 23 | --- | NULL |
| 4 | --- | NULL |
| 32 | --- | NULL |
| 58 | --- | NULL |
| 61 | --- | NULL |
| 43 | --- | NULL |
+---------+----------+----------+
Notice that the IDs column is jumbled thanks to the order by clause.
I have proper indexes to optimize these queries.
Now, let me explain the real problem. I have a lazy-load kind of functionality in my web-app. So, I display around 10 posts per page by using a LIMIT 10 after the query for the first page.
We are good till here. But, the real problem comes when I have to load the second page. What do I query now? I do not want the posts to be repeated. And there are new posts coming up almost every 15 seconds which make them go on top(by top I literally mean the first row) of the resultset(I do not want to display these latest posts in the second or third pages but they alter the resultset size so I cannot use LIMIT 10,10 for the 2nd page and so on as the posts will be repeated.).
Now, all I know is the last ID of the post that I displayed. Say 21 here. So, I want to display the posts of IDs 23,4,32,58,61,43 (refer to the resultset table above). Now, do I load all the rows without using the LIMIT clause and display 10 ids occurring after the id 21. But for that I will have to interate over thousands of useless rows.But, I cannot use a LIMIT clause for the 2nd,3rd... pages that is for sure. Also, the IDs are jumbled, so I can definitely not use WHERE ID>.... So, where do we go now?
I'm not sure if I've understood your question correctly, but here's how I think I would do it:
Add a timestamp column to your table, let's call it date_added
When displaying the first page, use your query as-is (with LIMIT 10) and hang on to the timestamp of the most recent record; let's call it last_date_added.
For the 2nd, 3rd and subsequent pages, modify your query to filter out all records with date_added > last_date_added, and use LIMIT 10, 10, LIMIT 20, 10, LIMIT 30, 10 and so on.
This will have the effect of freezing your resultset in time, and resetting it every time the first page is accessed.
Notes:
Depending on the ordering of your resultset, you might need a separate query to obtain the last_date_added. Alternatively, you could just cut off at the current time, i.e. the time when the first page was accessed.
If your IDs are sequential, you could use the same trick with the ID.
Hmm..
I thought for a while and came up with 2 solutions. :-
To store the Ids of the post already displayed and query WHERE ID NOT IN(id1,id2,...). But, that would cost you extra memory. And if the user loads 100 pages and the ids are in 100000s then a single GET request would not be able to handle it. At least not in all browsers. A POST request can be used.
Alter the way you display posts from COL1. I don't know if this would be a good way for you. But, it can save you bandwith and make your code cleaner. It may also be a better way. I would suggest this :- SELECT * from TABLE where COL1 IS NOT NULL AND COL2 IS NULL AND Id>.. ORDER BY ID DESC LIMIT 10,10. This can affect the way you display your posts by leaps and bounds. But, as you said in your comments that you check if a post meets a criteria and change the COL1 from NULL to the current timestammp, I guess that the newer the posts the, the more above you want to display them. It's just an idea.
I assume new posts will be added with a higher ID than the current max ID right? So couldn't you just run your query and grab the current max ID. Then when you query for page 2 do the same query but with "ID < max_id". This should give you the same result set as your page 1 query because any new rows will have ID > max_id. Hope that helps?
How about?
ORDER BY `COL1`,`ID`;
This would always put IDs in order. This will let you use:
LIMIT 10,10
for your second page.
Right now I have a PHP script that is fetching the first three results from a MYSQL database using:
SELECT * FROM table Order by DATE DESC LIMIT 3;
After that command I wanted PHP to fetch the next three results, initially I was going to use:
SELECT * FROM table Order by DATE DESC LIMIT 3,3;
However there will be a delay between the two commands which means that it is very possible that a new row will be inserted into the table during the delay. My first thought was to store the DATE value of the last result and then include a WHERE DATE > $stored_date but if entry 3 and 4 have the same date it will skip entry 4 and return results from 5 onward. This could be avoided using the primary key field which is an integer which increments automatically.
I am not sure which the best approach is, but I feel like there should be a more elegant and robust solution to this problem, however I am struggling to think of it.
Example table:
-------------------------------------------
| PrimaryKey | Data | Date |
-------------------------------------------
| 0 | abc | 2014-06-17 11:43:00 |
| 1 | def | 2014-06-17 12:43:00 |
| 2 | ghi | 2014-06-17 13:43:00 |
| 3 | jkl | 2014-06-17 13:56:00 |
| 4 | mno | 2014-06-17 14:23:00 |
| 5 | pqr | 2014-06-17 14:43:00 |
| 6 | stu | 2014-06-17 15:43:00 |
-------------------------------------------
Where Data is the column that I want.
Best will be using primary key and select like
SELECT * FROM table WHERE pk < $stored_pk Order by DATE DESC LIMIT 3;
And if you have automatically generated PK you should use ORDER BY pk it will be faster
Two options I can think of depending on what your script does:
You could either use transactions: performing these queries inside a transaction will give you a consistent view of the data.
Alternatively you could just use:
SELECT * FROM table Order by DATE DESC;
And only fetch the results as you need them.
I know that this title is overused, but it seems that my kind of question is not answered yet.
So, the problem is like this:
I have a table structure made of four tables (tables, rows, cols, values) that I use to recreate the behavior of the information_schema (in a way).
In php I am generating queries to retrieve the data, and the result would still look like a normal table:
SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')
HAVING (col2 LIKE "%4%")
OR
SELECT * FROM
(SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')) d
WHERE col2 LIKE "%4%"
note that the part where I define the columns of the result is generated by a php script. It is less important why I am doing this, but I want to extend this algorithm that generates the queries for a broader use.
And we got to the core problem, I have to decide if I will generate a where or a having part for the query, and I know when to use them both, the problem is my algorithm doesn't and I have to make a few extra checks for this. But the two above queries are equivalent, I can always put any query in a sub-query, give it an alias, and use where on the new derived table. But I wonder if I will have problems with the performance or not, or if this will turn back on me in an unexpected way.
I know how they both work, and how where is supposed to be faster, but this is why I came here to ask. Hopefully I made myself understood, please excuse my english and the long useless turns of phrases, and all.
EDIT 1
I already know the difference between the two, and all that implies, my only dilemma is that using custom columns from other tables, with variable numbers and size, and trying to achieve the same result as using a normally created table implies that I must use HAVING for filtering the derived tables columns, at the same time having the option to wrap it up in a subquery and use where normally, this probably will create a temporary table that will be filtered afterwards. Will this affect performance for a large database? And unfortunately I cannot test this right now, as I do not afford to fill the database with over 1 billion entries (that will be something like this: 1 billion in rows table, 5 billions in values table, as every row have 5 columns, 5 rows in cols table and 1 row in tables table = 6,000,006 entries in total)
right now my database looks like this:
+----+--------+-----------+------+
| id | name | title | dets |
+----+--------+-----------+------+
| 1 | table1 | Table One | |
+----+--------+-----------+------+
+----+-------+------+
| id | table | name |
+----+-------+------+
| 3 | 1 | col1 |
| 4 | 1 | col2 |
+----+-------+------+
where `table` is a foreign key from table `tables`
+----+-------+-------+
| id | table | extra |
+----+-------+-------+
| 1 | 1 | |
| 2 | 1 | |
+----+-------+-------+
where `table` is a foreign key from table `tables`
+----+-----+-----+----------+
| id | row | col | value |
+----+-----+-----+----------+
| 1 | 1 | 3 | 13 |
| 2 | 1 | 4 | 14 |
| 6 | 2 | 4 | 24 |
| 9 | 2 | 3 | asdfghjk |
+----+-----+-----+----------+
where `row` is a foreign key from table `rows`
where `col` is a foreign key from table `cols`
EDIT 2
The conditions are there just for demonstration purposes!
EDIT 3
For only two rows, it seems there is a difference between the two, the one using having is 0,0008 and the one using where is 0.0014-0.0019. I wonder if this will affect performance for large numbers of rows and columns
EDIT 4
The result of the two queries is identical, and that is:
+----------+------+
| col1 | col2 |
+----------+------+
| 13 | 14 |
| asdfghjk | 24 |
+----------+------+
HAVING is specifically for GROUP BY, WHERE is to provide conditional parameters. See also WHERE vs HAVING
I believe the having clause would be faster in this case, as you're defining specific values, as opposed to reading through the values and looking for a match.
See: http://database-programmer.blogspot.com/2008/04/group-by-having-sum-avg-and-count.html
Basically, WHERE filters out columns before passing them to an aggregate function, but HAVING filters the aggregate function's results.
you could do it like that
WHERE col2 In (14,24)
your code WHERE col2 LIKE "%4%" is bad idea so what about col2 = 34 it will be also selected.
I have this table structure,
| id | name | level |
---------------------
| 1 | a | 1 |
| 2 | b | 2 |
| 3 | c | 3 |
| 5 | d | 4 |
| 6 | e | 1 |
| 7 | f | 2 |
| 8 | g | 1 |
| 9 | g | 4 |
I want to order my fetch result to level, so I execute this query:
$sql = "SELECT * FROM section_tb WHERE id = ? ORDER BY level";
$stmt = $db->prepare($sql);
$stmt->execute(array($id));
$result = $stmt->fetch(PDO::FETCH_ASSOC);
However when I print_r($result) the order seems like it was sorted by id. I am confused why.
My db details:
id - PRIMARY, AUTO INCREMENT
name - UNIQUE
INNODB
Your query is ordering correctly.
This is irrelevant since it's only returning one row each time its called anyway.
Your foreach is calling into it multiple times, and the ordering only affects the actual database call. Therefore the overall order of the results is the order of that foreach.
If the foreach had passed a parameter that identified more than one row, then within each of those calls the order would be by level (e.g. if you'd done queries to match on name, then the two that match "g" would be in the order requested).
You want to change the query to something like SELECT * FROM section_tb WHERE id in (1,2,3,4,5,6,7,8,9) ORDER BY level (or perhaps just SELECT * FROM section_tb ORDER BY level), call it once, and loop through the results.
Your WHERE clause is seeking an id which you've identified as the primary key, so your query should only return one row.
You can't do that. You can either use a place holder or bind the parameter.