Inserting many table entries at the same time in PHP and MySQL - php

I just finished programming a plattform on PHP that uses MySQL to save the results of a poll. I will be doing the poll with a group of 24 users that will submit the form at more or less the same time. So that the mysqli_query() function will be executed around 24 times at the same time.
Is this a problem?
Should I worry about the function blocking the access to the db, or is the mysqli_query() safe to use without having to worry about blocking?

In general no, unless you are doing something very interesting in the application logic that requires the data from another row. For example if you are using an ID it should be of the auto increamenting type, pretty standard stuff. You know you are ok if you are are only doing an insert and no select before hand and don't have a crazy stored proc on the database, which you likely do not.

mysql_queries will executed by order in which they arrived to mysql server.
You can don't worry about it

Related

Could a SELECT harm or interfere with a data insertion script in MySQL?

This seems like a pretty basic question but one I don't know the answer to.
I wrote a script in PHP that loops through some data and then performs an UPDATE to records in our database. There are roughly some 150,000 records, so the script certainly takes a while to complete.
Could I potentially harm or interfere with the data insertion if I run a basic SELECT statement?
Say...I want to ensure that the script is working properly so if I run a basic SELECT COUNT() to see if it's increasing in real time as the script runs. Is this possible or would it screw something up?
Thank you!
Generally a SELECT call is incapable of "causing harm" provided you're not talking about SQL injection problems.
The InnoDB engine, which you should be using, has what's called Multi-Version Concurrency Control or MVCC for short. It means that until your UPDATE statement is finished, or the transaction that the statement is a part of, the SELECT will be done against the last consistent database state.
If you're using MyISAM, which is a very bad idea in most production environments due to the limitations of that engine and the way the data is stored without a rollback journal, the SELECT call will probably block until the UPDATE is applied since it does not support MVCC.

MySQL and UNIX_TIMESTAMP insert error

I have a problem with a project I am currently working on, built in PHP & MySQL. The project itself is similar to an online bidding system. Users bid on a project, and they get a chance to win if they follow their bid by clicking and cliking again.
The problem is this: if 5 users for example, enter the game at the same time, I get a 8-10 seconds delay in the database - I update the database using the UNIX_TIMESTAMP(CURRENT_TIMESTAMP), which makes the whole system of the bids useless.
I want to mention too that the project is very database intensive (around 30-40 queries per page) and I was thinking maybe the queries get delayed, but I'm not sure if that's happening. If that's the case though, any suggestions how to avoid this type of problem?
Hope I've been at least clear with this issue. It's the first time it happened to me and I would appreciate your help!
You can decide on
Optimizing or minimizing required queries.
You can cache queries do not need to update on each visit.
You can use Summery tables
Update the queries only on changes.
You have to do this cleverly. You can follow this MySQLPerformanceBlog
I'm not clearly on what you're doing, but let me elaborate on what you said. If you're using UNIX_TIMESTAMP(CURRENT_TIMESTAMP()) in your MySQL query you have a serious problem.
The problem with your approach is that you are using MySQL functions to supply the timestamp record that will be stored in the database. This is an issue, because then you have to wait on MySQL to parse and execute your query before that timestamp is ever generated (and some MySQL engines like MyISAM use table-level locking). Other engines (like InnoDB) have slower writes due to row-level locking granularity. This means the time stored in the row will not necessarily reflect the time the request was generated to insert said row. Additionally, it can also mean that the time you're reading from the database is not necessarily the most current record (assuming you are updating records after they were inserted into the table).
What you need is for the PHP request that generates the SQL query to provide the TIMESTAMP directly in the SQL query. This means the timestamp reflects the time the request is received by PHP and not necessarily the time that the row is inserted/updated into the database.
You also have to be clear about which MySQL engine you're table is using. For example, engines like InnoDB use MVCC (Multi-Version Concurrency Control). This means while a row is being read it can be written to at the same time. If this happens the database engine uses something called a page table to store the existing value that will be read by the client while the new value is being updated. That way you have guaranteed row-level locking with faster and more stable reads, but potentially slower writes.

Does refreshing mysql query by jQuery ajax crashs the database?

I'm using jQuery .ajax function to load php page with a mysql query in it that selects data from the database, but my question is: Does refreshing mysql query by jQuery ajax crashs or tiring the database?
Info: Refreshing by 1 second using setInterval().
Edited: This the queries that I use to refresh them.
SELECT * FROM table1 ORDER BY id DESC
SELECT * FROM table1 ORDER BY id ASC LIMIT 10
DELETE FROM table1 WHERE id = 'something'
I would you answer me.
Thanks in advance
MySQL is capable of serving hundreds of queries per second, even when running on low-end hardware.
However, you could still be "tiring" the database server, especially if you're running a very complex query, or if you do not have the necessary indexes on your tables. You may want to use the EXPLAIN syntax to see how MySQL is executing your query.
No it does not crash the database (how could it?). Of course, if you're requesting the script running the mysql_query a lot it might result in a DoS (but normally it requires a very high amount of requests to bring down a server).
EDIT: Your update states you're using setInterval() with 1 second. It this case it depends on how many users will have that page open at the same time. E.g. say 1000 Users are on that site - this would result in 60.000 Requests + Queries being fired every minute. If your query is only a simple select it might not be a problem. If your doing a slow-query you might want to check the slow query log to improve your query - or alternatively change the behaviour of your script.
What is the mysql query? Is it user specific? If not, it would be much better to cache the value server side once per second and use ajax to fetch this cached file rather than have all of your users simultaneously requesting the same data.

How bad is using SELECT MAX(id) in MYSQL instead of mysql_insert_id() in PHP?

Background: I'm working on a system where the developers seem to be using a function which executes a MYSQL query like "SELECT MAX(id) AS id FROM TABLE" whenever they need to get the id of the LAST inserted row (the table having an auto_increment column).
I know this is a horrible practice (because concurrent requests will mess the records), and I'm trying to communicate that to the non-tech / management team, to which their response is...
"Oh okay, we'll only face this problem when we have
(a) a lot of users, or
(b) it'll only happen when two people try doing something
at _exactly_ the same time"
I don't disagree with either point, and think we'll run into this problem much sooner than we plan. However, I'm trying to calculate (or figure a mechanism) to calculate how many users should be using the system before we start seeing messed up links.
Any mathematical insights into that? Again, I KNOW its a horrible practice, I just want to understand the variables in this situation...
Update: Thanks for the comments folks - we're moving in the right direction and getting the code fixed!
The point is not if potential bad situations are likely. The point is if they are possible. As long as there's a non-trivial probability of the issue occurring, if it's known it should be avoided.
It's not like we're talking about changing a one line function call into a 5000 line monster to deal with a remotely possible edge case. We're talking about actually shortening the call to a more readable, and more correct usage.
I kind of agree with #Mark Baker that there is some performance consideration, but since id is a primary key, the MAX query will be very quick. Sure, the LAST_INSERT_ID() will be faster (since it's just reading from a session variable), but only by a trivial amount.
And you don't need a lot of users for this to occur. All you need is a lot of concurrent requests (not even that many). If the time between the start of the insert and the start of the select is 50 milliseconds (assuming a transaction safe DB engine), then you only need 20 requests per second to start hitting an issue with this consistently. The point is that the window for error is non-trivial. If you say 20 requests per second (which in reality is not a lot), and assuming that the average person visits one page per minute, you're only talking 1200 users. And that's for it to happen regularly. It could happen once with only 2 users.
And right from the MySQL documentation on the subject:
You can generate sequences without calling LAST_INSERT_ID(), but the utility of
using the function this way is that the ID value is maintained in the server as
the last automatically generated value. It is multi-user safe because multiple
clients can issue the UPDATE statement and get their own sequence value with the
SELECT statement (or mysql_insert_id()), without affecting or being affected by
other clients that generate their own sequence values.
Instead of using SELECT MAX(id) you shoud do as the documentation says :
Instead, use the internal MySQL SQL function LAST_INSERT_ID() in an SQL query
Even so, neither SELECT MAX(id) nor mysql_insert_id() are "thread-safe" and you still could have race condition. The best option you have is to lock tables before and after your requests. Or even better use transactions.
I don't have the math for it, but I would point out that response (a) is a little silly. Doesn't the company want a lot of users? Isn't that a goal? That response implies that they'd rather solve the problem twice, possibly at great expense the second time, instead of solve it once correctly the first time.
This will happen when someone has added something to the table between one insert and that query running. So to answer your question, two people using the system has the potential for things to go wrong.
At least using the LAST_INSERT_ID() will get the last ID for a particular resource so it won't matter how many new entries have been added in between.
In addition to the risk of getting the wrong ID value returned, there's also the additional database query overhead of SELECT MAX(id), and it's more PHP code to actually execute than a simple mysql_insert_id(). Why deliberately code something to be slow?

Is it possible to do count(*) while doing insert...select... query in mysql/php?

Is it possible to do a simple count(*) query in a PHP script while another PHP script is doing insert...select... query?
The situation is that I need to create a table with ~1M or more rows from another table, and while inserting, I do not want the user feel the page is freezing, so I am trying to keep update the counting, but by using a select count(\*) from table when background in inserting, I got only 0 until the insert is completed.
So is there any way to ask MySQL returns partial result first? Or is there a fast way to do a series of insert with data fetched from a previous select query while having about the same performance as insert...select... query?
The environment is php4.3 and MySQL4.1.
Without reducing performance? Not likely. With a little performance loss, maybe...
But why are you regularily creating tables and inserting millions of row? If you do this only very seldom, can't you just warn the admin (presumably the only one allowed to do such a thing) that this takes a long time. If you're doing this all the time, are you really sure you're not doing it wrong?
I agree with Stein's comment that this is a red flag if you're copying 1 million rows at a time during a PHP request.
I believe that in a majority of cases where people are trying to micro-optimize SQL, they could get much greater performance and throughput by approaching the problem in a different way. SQL shouldn't be your bottleneck.
If you're doing a single INSERT...SELECT, then no, you won't be able to get intermediate results. In fact this would be a Bad Thing, as users should never see a database in an intermediate state showing only a partial result of a statement or transaction. For more information, read up on ACID compliance.
That said, the MyISAM engine may play fast and loose with this. I'm pretty sure I've seen MyISAM commit some but not all of the rows from an INSERT...SELECT when I've aborted it part of the way through. You haven't said which engine your table is using, though.
The other users can't see the insertion until it's committed. That's normally a good thing, since it makes sure they can't see half-done data. However, if you want them to see intermediate data, you could throw in an occassional call to "commit" while you're inserting.
By the way - don't let anybody tell you to turn autocommit on. That a HUGE time waster. I have a "delete and re-insert" job on my database that takes 1/3rd as long when I turn off autocommit.
Just to be clear, MySQL 4 isn't configured by default to use transactions. It uses the MyISAM table type which locks the entire table for each insert, if I remember correctly.
Your best bet would be to use one of the MySQL bulk insertion functions, such as LOAD DATA INFILE, as these are dramatically faster at inserting large amounts of data. As for the counting, well, you could break the inserts into N groups of 1000 (or Y) then divide your progress meter into N sections and just update it on each group's request.
Edit: Another thing to consider is, if this is static data for a template, then you could use a "select into" to create a new table with the same data. Not sure what your application is, or the intended functionality, but that could work as well.
If you can get to the console, you can ask various status questions that will give you the information you are looking for. There's a command that goes something like "SHOW processlist".

Categories