mysql UPDATE, performance WHERE check vs no check

mysql UPDATE, performance WHERE check vs no check - php

I have a very long table that has 10000 rows.
9500 rows have a column set to 1 and the remaining 500 have that column set to 0.
I want to update the table so all rows have that column set to 1.
Would it be faster to use WHERE column = 0 or is it better to skip the WHERE and just UPDATE all ?
Im using a prepared statement.
thanks

This is an interesting question. In general, I would say it is better to have where column = 0. It should never be detectably worse, but in MySQL it might not make a difference.
With no index on the column, then the query will need to read all the rows anyway in order to identify the ones that need to be updated. So, the query is going to do a full table scan regardless. Even with an index, the optimizer may still choose a full table scan, unless you have a clustered index on column.
Then, the overhead on doing an update is maintaining the log. I'm pretty sure that MySQL only logs actual changes to the database. In other words, it is doing the comparison anyway. So, MySQL is not going to "re-update" values to the same value. Note: not all databases behave this way.
All that said, I would always put the where column = 0 if that is your intention. On 10,000 rows, performance isn't the big issue. Clarity of code ranks higher, in my opinion. Also, I write code for multiple databases, so I prefer to write code that will work well across all of them.

With my script execution time measurements, for the MariaDB it's pretty much the same result with that queries (I mean result & time).
I will try it on MySQL too.

Related

The ways to increase "select" from db table with 8.000.000 rows?

Hellow, wrold!
I have read about that in web, but I have not found suitable solutions. I am not pro in sql, and I have the simplest table which contains 10 columns, 8.000.000 rows and the simplest query (i am working with php)
SELECT `field_name_1`, `field_name_2`, `field_name_3`
FROM `table_name`
WHERE `some_field`=#here_is_an_integer#
ORDER BY `some_field`
LIMIT 10;
Maybe some indexes, caching, or something like this.
If you have some minds about that, I'll be glad you help or just say the way I should follow to find the solution.
Thank you!

Use index on some_field and in the best way on all columns where you use SQL WHERE.
If you only want to show data, use pagination on Sql with LIMIT
And like Admin set caching and others Mysql (or MariaDB) limits for better searching like in there (Top 20+ MySQL Best Practices)
... simple answer, if you have space available, from
MySQL> ALTER TABLE table_name ADD INDEX ( some_field );

Here is things to think about:
Make sure all fields that will be joined, WHERE or ORDER / GROUP BY have appropriate indexes (unique or just plain text)
With many rows in a table, the memory cache of the server must be able to store the temporary resultset.
Use InnoDb for slower inserts and faster selects, and tune InnoDb to be able to store the temporary resultset
Do not use RAND() so that a resultset can be query-cached.
Filter early; try to put conditions on JOIN .. AND x=y, instead of on the final WHERE condition (unless of course it is about the main table). (inner) join in the most optimal order; if you have many users and little reports for example, start by selecting the users first, limiting the number of rows immediately before doing other joins.

Perhaps you can formalize your question a bit better (maybe with the use of a question mark somewhere).
Anyways, if you want to increase the speed of a select on a single table such as the one you describe then at the very least the column involved in the WHERE clause should have a proper index. This could be just a standard 'KEY (some_field)' if it is an integer type, otherwise if it is a string (i.e varchar or varbinary) type field with a reasonable amount of cardinality within the first n bytes you can use a prefix index and do something like 'KEY (some_field(8))' and minimize the index size (subsequently increasing performance by decreasing btree search time).
Ideally the server will have enough ram (and have a large enough innodb_buffer_pool_size if you are using innodb) to keep the aforementioned index in memory. Also look into https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_adaptive_hash_index
... if your server has the available resources and your application has the data access patterns to justify it.
Hope this helps!

How do I speed up a SQL UPDATE that also contains a JOIN on 25 million rows

the query i'd like to speed up (or replace with another process):
UPDATE en_pages, keywords
SET en_pages.keyword = keywords.keyword
WHERE en_pages.keyword_id = keywords.id
table en_pages has the proper structure but only has non-unique page_ids and keyword_ids in it. i'm trying to add the actual keywords(strings) to this table where they match keyword_ids. there are 25 million rows in table en_pages that need updating.
i'm adding the keywords so that this one table can be queried in real time and return keywords (the join is obviously too slow for "real time").
we apply this query (and some others) to sub units of our larger dataset. we do this frequently to create custom interfaces for specific sub units of our data for different user groups (sorry if that's confusing).
this all works fine if you give it an hour to run, but i'm trying to speed it up.
is there a better way to do this that would be faster using php and/or mysql?

I actually don't think you can speed up the process.
You can still add brutal power to your database by cluserting new servers.

Maybe I'm wrong or missunderstood the question but...
Couldn't you use TRIGGERS ?
Like... when a new INSERT is detected on "en_pages", doing a UPDATE after on that same row?
(I don't know how frequent INSERTS are in that table)
This is just an idea.
How often does "en_pages.keyword" and "en_pages.keyword_id" changes after being inserted ?!?!?

I don't know about mySQL but usually this sort of thing runs faster in SQL Server if you process a limited number of batches of records (say a 1000) at a time in a loop.
You might also consider a where clause (I don't know what mySQL uses for "not equal to" so I used the SQL Server verion):
WHERE en_pages.keyword <> keywords.keyword
That way you are only updating records that have a difference in the field you are updating not all of the them.

Is there a way to speed up this query with no WHERE clause?

I have about 1 million rows so its going pretty slow. Here's the query:
$sql = "SELECT `plays`,`year`,`month` FROM `game`";
I've looked up indexes but it only makes sense to me when there's a 'where' clause.
Any ideas?

Indexes can make a difference even without a WHERE clause depending on what other columns you have in your table. If the 3 columns you are selecting only make up a small proportion of the table contents a covering index on them could reduce the amount of pages that need to be scanned.
Not moving as much data around though, either by adding a WHERE clause or doing the processing in the database would be better if possible.

If you don't need all 1 million records, you can pull n records:
$sql = "SELECT `plays`,`year`,`month` FROM `game` LIMIT 0, 1000";
Where the first number is the offset (where to start from) and the second number is the number of rows. You might want to use ORDER BY too, if only pulling a select number of records.

You won't be able to make that query much faster, short of fetching the data from a memory cache instead of the db. Fetching a million rows takes time. If you need more speed, figure out if you can have the DB do some of the work, e.g. sum/group togehter things.
If you're not using all the rows, you should use the LIMIT clause in your SQL to fetch only a certain range of those million rows.

If you really need all the 1 million rows to build your output, there's not much you can do from the database side.
However you may want to cache the result on the application side, so that the next time you'd want to serve the same output, you can return the processed output from your cache.

The realistic answer is no. With no restrictions (ie. a WHERE clause or a LIMIT) on your query, then you're almost guaranteed a full table scan every time.
The only way to decrease the scan time would be to have less data (or perhaps a faster disk). It's possible that you could re-work your data to make your rows more efficient (CHARS instead of VARCHARS in some cases, TINYINTS instead of INTS, etc.), but you're really not going to see much of a speed difference with that kind of micro-optimization. Indexes are where it's at.
Generally if you're stuck with a case like this where you can't use indexes, but you have large tables, then it's the business logic that requires some re-working. Do you always need to select every record? Can you do some application-side caching? Can you fragment the data into smaller sets or tables, perhaps organized by day or month? Etc.

PHP's in_array vs. MySQL SELECT

I need to check if some integer value is already in my database (which is growing all the time). And it should be done several thousand times in one script. I'm considering two alternatives:
Read all those numbers from MySQL database into PHP array and every time I need to check it, use in_array function.
Every time I need to check the number, just execute something like SELECT number FROM table WHERE number='#' LIMIT 1
On the one hand, searching in array which is stored in RAM should be faster than querying mysql every time (as I have mentioned, these checks are performed about a thousand times during one script execution). On the other hand, DB is growing, ant that array may become quite big and that may slow things down.
Question is - which way is faster or better by some other aspects?

I have to agree that #2 is your best choice. When performing a query with a LIMIT 1 MySQL stops the query when it finds the first match. Make sure the columns you intend to search by are indexed.

It sounds like you are duplicating a Unique Constraint in code...
CREATE TABLE MyTable(
SomeUniqueValue INT NOT NULL
CONSTRAINT MyUniqueKey UNIQUE (SomeUniqueValue));

How does the number of times you need to check compare with the number of values stored in the database? If it's 1:100 then your probably better of searching in the database each time, if it's (some amount) less then preloading the list will be faster. What happened when you tested it?
However even if the ratio is low enough for it to be faster loading the full table, this will gobble up memory and could, as a result, make everything else run more slowly.
So I would recommend not loading it all into memory. But if you can, then batch the checks up to minimise the number of round trips to the database.
C.

querying the database is the best option, one because you said the database is growing so that means new values are being added to the table, whereis in in_array you would be reading old values. Secondly, you might exhaust the RAM alloted to PHP with very large amount of data. Thirdly, mysql has its own query optimizers and other optimizations which makes it a far better choice as compared to php

PHP and MySQL: optimize database

I have a database with over 10,000,000 rows. Querying it right now can take a few seconds just to find some basic information. This isn't preferable, I know that the best way to optimize is to minimize the number of rows which is possible, but right now I don't have the time to do this.
What's the easiest way to optimize a MySQL database so that when querying it, the time taken is short?
I don't mind about the size of the database, that doesn't really matter so any optimizations that increase the size are fine. I'm not very good with optimization, right now I have indexes set up, but I'm not sure how much better I can get from there.
I'll eventually trim down the database properly, but is there a quick temporary solution?

Besides indexing which has already been suggested, you may want to also look into partitioning tables if they are large.
Partitioning in MySQL
It's tough to be specific here, because we have very limited information, but proper indexing along with partitioning can go a very long way. Indexing properly can be a long subject, but in a very general sense you'll want to index columns you query against.
For example, say you have a table of employees, and you have your usual columns of SSN, FNAME, LNAME. In addition to those columns, we'll say that you have an additional 10 columns in the table as well.
Now you have this query:
SELECT FNAME, LNAME FROM EMPLOYEES WHERE SSN = 'blah';
Ignoring the fact that the SSN could likely be the primary key here and may already have a unique index on it, you would likely see a performance benefit by creating another composite index containing the columns (SSN, FNAME, LNAME). The reason this is beneficial is because the database can satisfy this query by simply looking at the composite index because it contains all the values needed in a sorted and compact space. (that is, less I/O). Even though the index on SSN only is a better access method to doing a full table scan, the database still has to read the data blocks for the index (I/O), find the value(s) which will contain pointers to the records needed to satisfy the query, then will need to read different data blocks (read: more random I/O) in order to retrieve the actual values for fname and lname.
This is obviously very simplified, but using indexes in this way can drastically reduce I/O and increase performance of your database.
Some other links here you may find helpful:
MySQL indexes - how many are enough?
When should I use a composite index?
MySQL Query Optimization (Particularly the section on "Choosing Indexes")

As I can see you request 40k rows from the database, this load of data needs time just to be transferred.
Also, never ask "how to improve in general". There is no way of "general" optimization. Optimization is always result of profiling and research of your particular case.

Use indexes on columns you search on very often.

In your example, 'WHERE x=y', if y is column name, create an index with y also.
The key with index is the # of result from your select query should be around 3% ~ 5% comparing entire table and it will be faster.
Also archieving table helps. I do not know how to do this, mostly DBA task.
For DBA it is simple task if they have been doing this.

If you're doing ordering or complex queries you may need to use multi-column indexes. For example if you're searching where x.name = 'y' OR x.phone = 'z' it might be worth putting an index on name,phone. Simplified example, but if you need to do this you'll need to research it further anyway :)

Are your queries using your indexes? What does running an EXPLAIN on your select queries tell you?
The first (and easiest) step will be making sure your queries are optimized.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.