How to insert more than 10000 rows to MSSQL Table - php

I have a PHP project where I have to insert more than 10,000 rows to a SQL Table. These data are taken from a table and checked for some simple conditions and inserted to the second table at the end of every month.
How should I do this.
I think need more clarification. I currently use small batch (250 inserts) transferring using PHP cronjob and it works fine. But i need to do this is most appropriate method.
What will be the most appropriate one.
Cronjob with PHP as I currently use
Exporting to a file and BULK import method
Some sort of Stored procedure to transfer directly
or any other.

Use insert SQL statement. :^ )
Adds one or more rows to a table or a view in SQL Server 2012. For examples, see Examples.
Example of using mssql_* extension.
$server = 'KALLESPC\SQLEXPRESS';
$link = mssql_connect($server, 'sa', 'phpfi');
mssql_query("INSERT INTO STUFF(id, value) VALUES ('".intval($id)."','".intval($value)."')");

Since the data is large, make the batch of 500 records for processing.
Check the condition for those 500 batches , till that time, make ready another batch of 500 and insert first batch and process so on.
This will not give load on your sql server.
By this way i daily process 40k Records.

Use BULK INSERT - it is designed for exactly what you are asking and significantly increases the speed of inserts.
Also, (just in case you really do have no indexes) you may also want to consider adding an indexes - some indexes (most an index one on the primary key) may improve the performance of inserts.
The actual rate at which you should be able to insert records will depend on the exact data, the table structure and also on the hardware / configuration of the SQL server itself, so I can't really give you any numbers.

SQL Server does not insert more than 1000 records in a single batch. You have to create separate batch for insertion. Here I am suggesting some of alternative which will help you.
Create one stored procedure. create two temporary table one for valid data and other for invalid data. one by one check all your rules and validation and base on that insert data into this both table.
If data is valid then insert into valid temp table else insert into invalid temp table.
Now, next using merge statement you can insert all that data into your source table as per your requirements.
you can transfer N number of records between tables so I hope this would be fine for you
Thanks.

it's so simple , you can do it using multiple while, since 10000 rows is not huge data!
$query1 = mssql_query("select top 10000 * from tblSource");
while ($sourcerow = mssql_fetch_object($query1)){
mssql_query("insert into tblTarget (field1,field2,fieldn) values ($sourcerow->field1,$sourcerow->field2,$sourcerow->fieldn)");
}
this should be work as fine

Related

Check if exists records to insert or update In MYSQL

Every week I need to load 50K~200K rows of records from a raw CSV file to my system.
Currently I am solution is to load the CVS to a temp table(empty it after the process), then run my Stored procedure to manipulate the data to different relevant tables in my system. If records already exists will run update query (80% records in CSV are already in my system table), if not exists will Insert the records.
The problem i am facing now is the tables are growing to few millions records, approx. 5~6 millions each tables.
"Select Exist" seems very slow too, after that i change to left join tables by batch also slow.
Even I just loaded 5K records it may took about few hours to finish the Stored Procedure process.
Any good and faster solutions to handle huge records when comparing tables to decide insert/update records?
Thanks!!
Jack
Do the following process which will reduce your time
First try to update the record and check the number of rows affected if number of rows affected = 0 then insert record.
But make sure every time you need to modify the modified_Date if modified_Date not exist in table then you need to add that because if the all data are same in new and old record then it will create new query just because there is no modification in table record so it will return 0.
Slow responds of MySQL is almost always a problem of wrong indexing or uncorrect use of it.
If you use keys or/and index correct, a INSERT ... ON DUPLICATE KEY UPDATE ... should work.
Try to work only on an existing index/key. Check your statements with a EXPLAIN SELECT.
IMHO your tmp-table based preprocessing is ok.

Big mysql database for ips

Part of my project involves storing and retrieving loads of ips in my database. I have estimated that my database will have millions of ips within months of starting the project. That been the case I would like to know how slow simple queries to a big database can get? What will be the approximate speeds of the following queries:
SELECT * FROM table where ip= '$ip' LIMIT 1
INSERT INTO table(ip, xxx, yyy)VALUES('$ip', '$xxx', '$yyy')
on a table with 265 million rows?
Could I speed query speeds up by having 255^2 tables created that would have names corresponding to all the 1st two numbers of all possible ipv4 ip addresses, then each table would have a maximum of 255^2 rows that would accommodate all possible 2nd parts to the ip. So for example to query the ip address "216.27.61.137" it would be split into 2 parts, "216.27"(p1) and "61.137"(p2). First the script would select the table with the name, p1, then it would check to see if there are any rows called "p2", if so it would then pull the required data from the row. The same process would be used to insert new ips into the database.
If the above plan would not work what would be a good way to speed up queries in a big database?
The answers to both your questions hinge on the use of INDEXES.
If your table is indexed on ip your first query should execute more or less immediately, regardless of the size of your table: MySQL will use the index. Your second query will slow as MySQL will have to update the index on each INSERT.
If your table is not indexed then the second query will execute almost immediately as MySQL can just add the row at the end of the table. Your first query may become unusable as MySQL will have to scan the entire table each time.
The problem is balance. Adding an index will speed the first query but slow the second. Exactly what happens will depend on server hardware, which database engine you choose, configuration of MySQL, what else is going on at the time. If performance is likely to be critical, do some tests first.
Before doing any of that sort, read this question (and more importantly) its answers: How to store an IP in mySQL
It is generally not a good idea to split data among multiple tables. Database indexes are good at what they do, so just make sure you create them accordingly. A binary column to store IPv4 addresses will work rather nicely - it is more a question of query load than of table size.
First and foremost, you can't predict how long will a query will take, even if we knew all information about the database, the database server, the network performance and another thousands of variables.
Second, if you are using a decent database engine, you don't have to split the data into different tables. It knows how to handle big data. Leave the database functionality to the database itself.
There are several workarounds to deal with large datasets. Using the right data types and creating the right indexes will help a lot.
When you begin to have problems with your database, then search for something specific to the problem you are having.
There are no silver bullets to big data problems.

Optimized ways to update every record in a table after running some calculations on each row

There is a large table that holds millions of records. phpMyAdmin reports 1.2G size for the table.
There is a calculation that needs to be done for every row. The calculation is not simple (cannot be put in set col= calc format), it uses a stored function to get the values, so currently we have for each row a single update.
This is extremely slow and we want to optimize it.
Stored function:
https://gist.github.com/a9c2f9275644409dd19d
And this is called by this method for every row:
https://gist.github.com/82adfd97b9e5797feea6
This is performed on a off live server, and usually it is updated once per week.
What options we have here.
Why not setup a separate table to hold the computed values to take the load off your current table. It can have two columns: primary key for each row in your main table and a column for the computed value.
Then your process can be:
a) Truncate computedValues table - This is faster than trying to identify new rows
b) Compute the values and insert into the computed values table
c) So when ever you need your computed values you join to the computedValues table using a primary key join which is fast, and in case you need more computations well you just add new columns.
d) You can also update the main table using the computed values if you have to
Well, the problem doesn't seem to be the UPDATE query because no calculations are performed in the query itself. As it seems the calculations are performed first and then the UPDATE query is run. So the UPDATE should be quick enough.
When you say "this is extremely slow", I assume you are not referring to the UPDATE query but the complete process. Here are some quick thoughts:
As you said there are millions of records, updating those many entries is always time consuming. And if there are many columns and indexes defined on the table, it will add to the overhead.
I see that there are many REPLACE INTO queries in the function getNumberOfPeople(). These might as well be a reason for the slow process. Have you checked how efficient are these REPLACE INTO queries? Can you try removing them and then see if it has any impact on the UPDATE process.
There are a couple of SELECT queries too in getNumberOfPeople(). Check if they might be impacting the process and if so, try optimizing them.
In procedure updateGPCD(), you may try replacing SELECT COUNT(*) INTO _has_breakdown with SELECT COUNT(1) INTO _has_breakdown. In the same query, the WHERE condition is reading _ACCOUNT but this will fail when _ACCOUNT = 0, no?
On another suggestion, if it is the UPDATE that you think is slow because of reason 1, it might make sense to move the updating column gpcd outside usage_bill to another table. The only other column in the table should be the unique ID from usage_bill.
Hope the above make sense.

How to run multiple sql queries using php without giving load on mysql server?

I have a script that reads an excel sheet containing list of products. These are almost 10000 products. The script reads these products & compares them with the products inside mysql database, & checks
if the product is not available, then ADD IT (so I have put insert query for that)
if the product is already available, then UPDATE IT (so I have put update query for that)
Now the problem is, it creates a very heavy load on mysql server & it shows a message as "mysql server gone away..".
I want to know is there a better method to do this excel sheet work without making load on mysql server?
I am not sure if this is the case, but judging from your post, I assume it could be the case that for every check you initilize a new connection to the MySQL server. If that indeed is the case you can simply connect once before you do this check, and run all future queries trought this connection.
Next to that a good optimization option would be to introduce indexes in MySQL that would significantly speed up product search, introduce index for those product table columns, that you reference most in your php search function.
Next to that you could increase MySQL buffer size to something above 256 MB in order to cache most of the results, and also use InnoDB so you do not need to lock whole table every time you do the check, and also the input function.
I'm not sure why PHP has come into the mix. Excel can connect directly to a MySql database and you should be able to do a WHERE NOT IN query to add items and a UPDATE statements of ons that have changed Using excel VBA.
http://helpdeskgeek.com/office-tips/excel-to-mysql/
You could try and condense your code somewhat (you might have already done this though) but if you think it can be whittled down more, post it and we can have a look.
Cache data you know exists already, so if a products variables don't change regularly you might not need to check them so often. You can cache the data for quick retrieval/changes later (see Memcached, other caching alternatives are available). You could end up reducing your work load dramatically.
Have you seperated your mysql server? Try running the product checks on a different sub-system, and merge the databases to your main, hourly or daily or whatever.
Ok, here is quick thought
Instead of running the query, after every check, where its present or not, add on to your sql as long as you reach the end and then finally execute it.
Example
$query = ""; //creat a query container
if($present) {
$query .= "UPDATE ....;"; //Remember the delimeter ";" symbol
} else {
$query .= "INSERT ....;";
}
//Now, finally run it
$result = mysql_query($query);
Now, you make one query at the last part.
Update: Approach this the another way
Use the query to handle it.
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;
Reference

Tricky MySQL Batch Design

I have a scraper which visits many sites and finds upcoming events and another script which is actually supposed to put them in the database. Currently the inserting into the database is my bottleneck and I need a faster way to batch the queries than what I have now.
What makes this tricky is that a single event has data across three tables which have keys to each other. To insert a single event I insert the location or get the already existing id of that location, then insert the actual event text and other data or get the event id if it already exists (some are repeating weekly etc.), and finally insert the date with the location and event ids.
I can't use a REPLACE INTO because it will orphan older data with those same keys. I asked about this in Tricky MySQL Batch Query but if TLDR the outcome was I have to check which keys already exist, preallocate those that don't exist then make a single insert for each of the tables (i.e. do most of the work in php). That's great but the problem is that if more than one batch was processing at a time, they could both choose to preallocate the same keys then overwrite each other. Is there anyway around this because then I could go back to this solution? The batches have to be able to work in parallel.
What I have right now is that I simply turn off the indexing for the duration of the batch and insert each of the events separately but I need something faster. Any ideas would be helpful on this rather tricky problem. (The tables are InnoDB now... could transactions help solve any of this?)
I'd recommend starting with Mysql Lock Tables which you can use to prevent other sessions from writing to the tables whilst you insert your data.
For example you might do something similar to this
mysql_connect("localhost","root","password");
mysql_select_db("EventsDB");
mysql_query("LOCK TABLE events WRITE");
$firstEntryIndex = mysql_insert_id() + 1;
/*Do stuff*/
...
mysql_query("UNLOCK TABLES);
The above does two things. Firstly it locks the table preventing other sessions from writing to it until you the point where you're finished and the unlock statement is run. The second thing is the $firstEntryIndex; which is the first key value which will be used in any subsequent insert queries.

Categories