Let's say I insert some data into multiple different tables.
Table A:
Name
Address
Location
Table B:
Name
Address
Location
What is the chance of MySQL say inserting into 1 but not the other if these we're 2 different mysql queries
I am trying to say, what is the chance of PHP or MySQL not inserting the data if all the data is completely valid.
Can PHP or MySQL mess up in any way and miss a query, especially if I am doing hundreds a second?
If so, how would I combat this?
Use a "database transaction".
A database transaction commits ALL or NONE of the operations you are performing, all at once.
If you have multiple INSERT, UPDATE, and/or DELETE operations that you would like to perform together, then you should:
Initiate a transaction.
Perform each one of the operations, one after the other.
Commit the transaction.
This way if something fails in between, NONE of them will actually happen until the "commit" is executed.
Related
I am building a PHP RESTful-API for remote "worker" machines to self-assign tasks. The MySQL InnoDB table on the API host holds pending records that the workers can pick up from the API whenever they are ready to work on a record. How do I prevent concurrently requesting worker system from ever getting the same record?
My initial plan to prevent this is to UPDATE a single record with a uniquely generated ID in a default NULL field, and then poll for the details of the record where the unique ID field matches.
For example:
UPDATE mytable SET status = 'Assigned', uniqueidfield = '3kj29slsad'
WHERE uniqueidfield IS NULL LIMIT 1
And in the same PHP instance, the next query:
SELECT id, status, etc FROM mytable WHERE uniqueidfield = '3kj29slsad'
The resulting record from the SELECT statement above is then given to the worker. Would this prevent simultaneously requesting workers from getting the same records shown to them? I am not exactly sure on how MySQL handles the lookups within an UPDATE query, and if two UPDATES could "find" the same record, and then update it sequentially. If this works, is there a more elegant or standardized way of doing this (not sure if FOR UPDATE would need to be applied to this)? Thanks!
Nevermind my previous answer. I believe I understand what you are asking. I'll reword it so maybe it is clearer to others.
"If I issue two of the above update statements at the same time, what would happen?"
According to http://dev.mysql.com/doc/refman/5.0/en/lock-tables-restrictions.html, the second statement would not interfere with the first one.
Normally, you do not need to lock tables, because all single UPDATE
statements are atomic; no other session can interfere with any other
currently executing SQL statement.
A more elegant way is probably opinion based, but I don't see anything wrong with what you're doing.
I am not an SQL magician so I'm venturing to ask for help. I have 4 tables to insert into a 5th one while checking a 6th table to ensure no duplicates. For example, no names in the 6th table can be inserted in the 5th one. I probably can try to figure out the best SQL query for the job but my head can't get around the right method? The final table size is small for now (5000 contact names), but will grow every month so I got to start right. I plan to use a PHP script with mysql connection to the database. This script will only run on my server (CenTOS 5).
Without seeing the schema, in general if you're going to prevent rows from entering tables based on other tables - in mySQL you'll need to utilize foreign keys. Overall, all of this will need to be done in a database transaction so that whatever logic you create in PHP to insert rows in various tables either succeed after total confirmation of success or fail and roll back to the prior state.
Part of my project involves storing and retrieving loads of ips in my database. I have estimated that my database will have millions of ips within months of starting the project. That been the case I would like to know how slow simple queries to a big database can get? What will be the approximate speeds of the following queries:
SELECT * FROM table where ip= '$ip' LIMIT 1
INSERT INTO table(ip, xxx, yyy)VALUES('$ip', '$xxx', '$yyy')
on a table with 265 million rows?
Could I speed query speeds up by having 255^2 tables created that would have names corresponding to all the 1st two numbers of all possible ipv4 ip addresses, then each table would have a maximum of 255^2 rows that would accommodate all possible 2nd parts to the ip. So for example to query the ip address "216.27.61.137" it would be split into 2 parts, "216.27"(p1) and "61.137"(p2). First the script would select the table with the name, p1, then it would check to see if there are any rows called "p2", if so it would then pull the required data from the row. The same process would be used to insert new ips into the database.
If the above plan would not work what would be a good way to speed up queries in a big database?
The answers to both your questions hinge on the use of INDEXES.
If your table is indexed on ip your first query should execute more or less immediately, regardless of the size of your table: MySQL will use the index. Your second query will slow as MySQL will have to update the index on each INSERT.
If your table is not indexed then the second query will execute almost immediately as MySQL can just add the row at the end of the table. Your first query may become unusable as MySQL will have to scan the entire table each time.
The problem is balance. Adding an index will speed the first query but slow the second. Exactly what happens will depend on server hardware, which database engine you choose, configuration of MySQL, what else is going on at the time. If performance is likely to be critical, do some tests first.
Before doing any of that sort, read this question (and more importantly) its answers: How to store an IP in mySQL
It is generally not a good idea to split data among multiple tables. Database indexes are good at what they do, so just make sure you create them accordingly. A binary column to store IPv4 addresses will work rather nicely - it is more a question of query load than of table size.
First and foremost, you can't predict how long will a query will take, even if we knew all information about the database, the database server, the network performance and another thousands of variables.
Second, if you are using a decent database engine, you don't have to split the data into different tables. It knows how to handle big data. Leave the database functionality to the database itself.
There are several workarounds to deal with large datasets. Using the right data types and creating the right indexes will help a lot.
When you begin to have problems with your database, then search for something specific to the problem you are having.
There are no silver bullets to big data problems.
I have a scraper which visits many sites and finds upcoming events and another script which is actually supposed to put them in the database. Currently the inserting into the database is my bottleneck and I need a faster way to batch the queries than what I have now.
What makes this tricky is that a single event has data across three tables which have keys to each other. To insert a single event I insert the location or get the already existing id of that location, then insert the actual event text and other data or get the event id if it already exists (some are repeating weekly etc.), and finally insert the date with the location and event ids.
I can't use a REPLACE INTO because it will orphan older data with those same keys. I asked about this in Tricky MySQL Batch Query but if TLDR the outcome was I have to check which keys already exist, preallocate those that don't exist then make a single insert for each of the tables (i.e. do most of the work in php). That's great but the problem is that if more than one batch was processing at a time, they could both choose to preallocate the same keys then overwrite each other. Is there anyway around this because then I could go back to this solution? The batches have to be able to work in parallel.
What I have right now is that I simply turn off the indexing for the duration of the batch and insert each of the events separately but I need something faster. Any ideas would be helpful on this rather tricky problem. (The tables are InnoDB now... could transactions help solve any of this?)
I'd recommend starting with Mysql Lock Tables which you can use to prevent other sessions from writing to the tables whilst you insert your data.
For example you might do something similar to this
mysql_connect("localhost","root","password");
mysql_select_db("EventsDB");
mysql_query("LOCK TABLE events WRITE");
$firstEntryIndex = mysql_insert_id() + 1;
/*Do stuff*/
...
mysql_query("UNLOCK TABLES);
The above does two things. Firstly it locks the table preventing other sessions from writing to it until you the point where you're finished and the unlock statement is run. The second thing is the $firstEntryIndex; which is the first key value which will be used in any subsequent insert queries.
I am developing a web app using zend framework and the problem is about combining 2 sql queries for improving efficiency. My table structure is like this
>table message
id(int auto incr)
body(varchar)
time(datetime)
>table message_map
id(int auto incr)
message_id(forgain key from message table's id column)
sender(int ) comment 'user id of sender'
receiver(int) comment 'user id of receiver'
To get the code working, I am first inserting the message body and time to the message table and then using the last inserted id, I am inserting message sender and receiver to message_map table. Now what I want to do is to do this task in a single query as using one query will be more efficient. Is there any way to do so.
No there isn't. You can insert in only one table at once.
But I can't imagine you need to insert so much messages that performance really becomes an issue. Even with these separate statements, any database can easily insert thousands of records a minute.
bulk inserts
Of course, when inserting multiple records in the same table, that's a different matter. This is indeed possible in MySQL and it will make your query a lot faster. It will give you trouble, though, if you need to insert_ids from all those records.
mysql_insert_id() returns the first id that is inserted in the last insert statement, if it is a bulk insert. So you could query all id's that are >= that id. It should give you all records you just inserted, although the result may contain id's that other people inserted between your insert and the following query for those ids.
if its for only these two tables. Why dont you create a single table having all these columns in one as
>table message
id(int auto incr)
body(varchar)
sender(int ) comment 'user id of sender'
receiver(int) comment 'user id of receiver'
time(datetime)
then it will be like the way you want.
I agree with GolezTrol or otherwise if you want an optimized performance for your query perhaps you may choose to use Stored Procedures
Indeed combining those two inserts wouldn't be possible. While you van use JOIN in get queries, you can't combine insert queries. If your really worrying about performance, isn't there anyway to join those two tables together? As far is I can see there's no point in keeping them separated; there both about the message.
As stated before, executing a second insert query isn't that much of a server load by the way.
As others pointed out, you cannot really update multiple tables at once. And, you should not really be worried about performance, unless you are inserting thousands of messages in a short period of time.
Now, there is one thing you could worry about. Imagine, you first insert the message body, and then try to insert the receiver/sender IDs. Suppose first succeeds, while second (for whatever reason) fails. That would corrupt your data a bit. To avoid that, you can use transactions, e.g.
mysql_query("START TRANSACTION", $connection);
//your code
mysql_query("COMMIT", $connection);
That would ensure that either both inserts get into the database, or neither do. If you are using PDO, look into http://www.php.net/manual/en/pdo.begintransaction.php for examples.