PHP/MySQL - Updating 70 million rows every week - php

I have a database with just over 70 million rows in. This data was originally parsed and imported from roughly 70,000 XML files. These files are updated every week so I need to scan through these XML files (via a cron on Sundays at 2AM in the morning) and update rows that have changed/insert new rows.
$operatorSQL = "INSERT IGNORE INTO `operator` (`reference`, `national_operator_code`, `operator_code`, `operator_short_name`, `operator_name_on_license`, `trading_name`) VALUES (:reference, :nationalOperatorCode, :operatorCode, :operatorShortName, :operatorNameOnLicense, :tradingName);";
$serviceSQL = "INSERT IGNORE INTO `service` (`service_code`, `private_code`, `date_start`, `date_end`, `mode`, `description`, `origin`, `destination`) VALUES (:serviceCode, :privateCode, :dateStart, :dateEnd, :mode, :description, :origin, :destination);";
$serviceOperatorSQL = "INSERT IGNORE INTO `service_operator` (`service_code`, `operator_reference`) VALUES (:serviceCode, :operatorReference);";
$journeyPatternSQL = "INSERT IGNORE INTO `journey_pattern` (`reference`, `direction`, `destination_display`, `vehicle_type_code`, `vehicle_type_description`) VALUES (:reference, :direction, :destinationDisplay, :vehicleTypeCode, :vehicleTypeDescription);";
$journeyPatternRouteSQL = "INSERT IGNORE INTO `journey_pattern_route` (`journey_pattern_reference`, `route_reference`) VALUES (:reference, :routeReference);";
$journeyPatternSectionLink = "INSERT IGNORE INTO `journey_pattern_section_link` (`journey_pattern_reference`, `journey_pattern_section_reference`) VALUES (:reference, :journeyPatternSectionReference);";
$journeyPatternSectionSQL = "INSERT IGNORE INTO `journey_pattern_section` (`reference`) VALUES (:reference);";
$lineSQL = "INSERT IGNORE INTO `service_line` (`service_code`, `name`) VALUES (:serviceCode, :name);";
$timingLinkSQL = "INSERT IGNORE INTO `journey_pattern_timing_link` (`reference`, `stop_from`, `stop_from_timing`, `stop_from_sequence_number`, `stop_from_activity`, `stop_to`, `stop_to_timing`, `stop_to_sequence`, `stop_to_activity`, `run_time`, `direction`) VALUES (:reference, :stopFrom, :stopFromTiming, :stopFromSequenceNumber, :stopFromActivity, :stopTo, :stopToTiming, :stopToSequenceNumber, :stopToActivity, :runTime, :direction)";
$timingLinkJpsSQL = "INSERT INTO `journey_pattern_timing_link_jps` (`journey_pattern_timing_link`, `journey_pattern_section_reference`) VALUES (:linkReference, :sectionReference);";
$timingLinkRouteLinkRefSQL = "INSERT INTO `journey_pattern_timing_link_rlr` (`journey_pattern_timing_link`, `route_link_reference`) VALUES (:linkReference, :routeLinkReference);";
$routeSQL = "INSERT IGNORE INTO `route` (`reference`, `private_code`, `description`) VALUES (:reference, :privateCode, :description);";
$routeSectionSQL = "INSERT IGNORE INTO `route_section` (`reference`) VALUES (:reference);";
$routeLinkSQL = "INSERT IGNORE INTO `route_link` (`reference`, `stop_from`, `stop_to`, `direction`, `distance`) VALUES (:reference, :stopFrom, :stopTo, :direction, :distance);";
$routeLinkSectionSQL = "INSERT INTO `route_link_section` (`route_link_reference`, `route_section_reference`) VALUES (:routeLinkReference, :routeSectionReference);";
$vehicleJourneySQL = "INSERT IGNORE INTO `vehicle_journey` (`reference`, `private_code`, `departure`) VALUES (:reference, :privateCode, :departure);";
$vehicleJourneyServiceSQL = "INSERT IGNORE INTO `vehicle_journey_service` (`vehicle_journey_reference`, `service_reference`) VALUES (:reference, :serviceRef);";
$vehicleJourneyLineSQL = "INSERT IGNORE INTO `vehicle_journey_line` (`vehicle_journey_reference`, `service_line_reference`) VALUES (:reference, :lineRef);";
$vehicleJourneyJpSQL = "INSERT IGNORE INTO `vehicle_journey_jp` (`vehicle_journey_reference`, `journey_pattern_reference`) VALUES (:reference, :journeyPatternRef);";
Above are all of the SQL queries that are performed. You will notice that the IGNORE clause is used in the INSERT statement, this is just to make sure that if any files have duplicate data no errors will stop the script, instead it'll just ignore it and move on.
I don't feel this is the most efficient way of doing it however as when I run the script again after doing the initial insert of all the data it's just as slow as when the original inserts are executed... surely if 99.9% of the rows are the same it should skim through? Any ideas why this is happening?

Query optimisation is normally for select, update and delete queries. The fact that you are just inserting data into table(s) means there is no query optimisation to be done; the engine does not have to work out some complicated plan to shove that data into the tables. The speed at which it will do the insert is just a function of your CPU, hard-disk speed, I/O network bandwidth, amongst other factors. The data you are inserting is not being cached in any sense so if you do the inserts again, it will be done at the same rate.

First, I would determine whether the performance issue is with the XML parsing or the database. What happens if you run the program but simply leave out the calls to the database -- does it take significantly less time than with the database calls in?
If it's the database I would do the following, if reasonably possibly given the nature of your data:
If possible, divide the task into multiple passes across the data, each pass representing a single table's worth of INSERTs.
Read and cache sufficient key information for all records in the table in a performance-oriented data structure so that you can perform the "do I have the record" check in memory.
For each record found in the input, only issue an INSERT statement if the cached-memory check indicates you don't have the record already.
This will be possible if your key information is not large (integer key values, for instance) and if you're not running on a memory-constricted box. Otherwise, the cached keys may not fit in memory. If this is the case

Related

How to make PHP MYSQL Insert query only with values which are existed for certain columns?

I have 17 columns in my DB
I'm inserting values from different sources. Somewhere I haven't, for example, company/company_info values (I'm setting in PHP FALSE values for relevant variables).
So, I need some kind of PHP INSERT query to insert only not empty variables and columns of certain list.
For example, I could do:
$q = "INSERT INTO `$tname` (`phone`,`location`, `pagelang`, `company`, `company_url`, `phone_no_cc`, `phone_type`, `operator`, `pageviews`, `rating`, `comments_number`, `activity_by_days`, `activity_by_hours`) VALUES (
'$main_number', '$number_advanced_info[location]', '$pagelang', '$company[name]', '$company[site]', '$number_advanced_info[number_no_countrycode]', '$number_advanced_info[phone_type]', '$number_advanced_info[operator]', '$searches_comments[searches]', '$rating', '$searches_comments[comments]', '$history_search', '$daily_history'
);";
With insert of 14 columns and their values.
But sometimes I need to insert less columns/values and let MYSQL set default values for not listed columns. For Example, I want to insert only 5 columns.
$q = "INSERT INTO `$tname` (`phone`,`location`, `pageviews`, `rating`) VALUES (
'$main_number', '$number_advanced_info[location]', '$searches_comments[searches]', '$rating'
);";
Is there some CLASS or any solution like binding values which will automatically build query depending which values are not NULL?
I need some kind of code:
if (!$phone) {
$columns .= "`column_name`," ;
$values .= "value";
}

insert value from one table into another whilst doing a double insert

Not sure that this is even possible. I am inserting values into two tables at the same time using multi_query. That works fine. One of the tables has an auto increment column and I need to take the last auto incremented number and insert it into the second table so like this: insert into table 1 then take the last inserted number from column X and insert it along with other data into table 2. I have played around with using SELECT LAST and INSERT INTO but so far its just doing my head in. The insert statements looks like this:
$sql= "INSERT INTO tbleClients (riskClientId, riskFacility, riskAudDate) VALUES ('$riskclient2', '$facility2', '$riskdate2');";
$sql .="SELECT LAST(riskId) FROM tbleClients;";$sql .="INSERT INTO tblRiskRegister (riskId) SELECT riskId FROM tbleClients ;";
$sql .= "INSERT INTO tblRiskRegister (riskAudDate, riskClientId, riskSessionId, RiskNewId) VALUES ('$riskdate2', '$riskclient2', '$sessionid', '$risknewid')";
Individually they all produce results but I need it to happen simultaneously. I did toy with the idea of doing them all separately but figure thats not very efficient. Any pointers would be appreciated.
$sql= "INSERT INTO tbleClients (riskClientId, riskFacility, riskAudDate) VALUES ('$riskclient2', '$facility2', '$riskdate2');";
After executing the above query, using mysqli_insert_id(), which gives you the last insert id.
So below query is useless.
$sql .="SELECT LAST(riskId) FROM tbleClients;";
$sql .="INSERT INTO tblRiskRegister (riskId) SELECT riskId FROM tbleClients ;";
You can insert last_insert_id in above query.
Unable to find the relation between above & below query.
$sql .= "INSERT INTO tblRiskRegister (riskAudDate, riskClientId, riskSessionId, RiskNewId) VALUES ('$riskdate2', '$riskclient2', '$sessionid', '$risknewid')";

Using On Duplicate Key Update with an array

I'm relatively new to MYSQL and am having trouble combining idea I have read about. I have a form generated from a query. I want to be able to insert or update depending on whether there is currently a matching row. I have the following code which works for inserting but I;m struggling with the On DUPLICATE UPDATE part I keep getting a message saying there is an error in my syntax or unexpeted ON depending on how I put the ' .
require_once("connect_db.php");
$row_data = array();
foreach($_POST['attendancerecordid'] as $row=>$attendancerecordid) {
$attendancerecordid=mysqli_real_escape_string($dbc,$attendancerecordid);
$employeeid=mysqli_real_escape_string($dbc,($_POST['employeeid'][$row]));
$linemanagerid=mysqli_real_escape_string($dbc,($_POST['linemanagerid'][$row]));
$abscencecode=mysqli_real_escape_string($dbc,($_POST['abscencecode'][$row]));
$date=mysqli_real_escape_string($dbc,($_POST['date'][$row]));
$row_data[] = "('$attendancerecordid', '$employeeid', '$linemanagerid', '$abscencecode', '$date')";
}
if (!empty($row_data)) {
$sql = 'INSERT INTO attendance (attendancerecord, employeeid, linemanagerid, abscencecode, date) VALUES '.implode(',', $row_data)
ON DUPLICATE KEY UPDATE abscencecode = $row_data[abscencecode];
echo $sql;
$result = mysqli_query ($dbc, $sql) or die(mysqli_error ($dbc));
}
The various echo statements are showing that the correct data is coming through and my select statement was as expected before I added in the ON DUPLICATE statement.
You need to fix the way the sql statement is constructed via string concatenation. When you create an sql statement, echo it and run it in your favourite mysql manager app for testing.
$sql = 'INSERT INTO attendance (attendancerecord, employeeid, linemanagerid, abscencecode, date) VALUES ('.implode(',', $row_data).') ON DUPLICATE KEY UPDATE abscencecode = 1'; //1 is a fixed value yiu choose
UPDATE: Just noticed that your $row_data array does not have named keys, it just contains the entire new rows values as string. Since you do bulk insert (multiple rows inserted in 1 statement), you have to provide a single absencecode in the on duplicate key clause, or you have to execute each row in a separate insert to get the absence code for each row in a loop.

Three insert statements into three tables

I have the following insert statements:
$sql ="INSERT INTO `firm`(name, VAT, active) VALUES ('$name', '$VAT', '$active')";
$sql = "INSERT INTO `area`(name, hub_name, fk_hub_id) VALUES ('$areaname',(SELECT `name` from hub WHERE name = '$hub_name'), (SELECT `id` from hub WHERE name = '$hub_name'))";
$sql ="INSERT INTO 'contactdetails'
(fk_firm_id,
address_physical_line_1,
address_physical_line_2,
address_physical_line_3,
address_physical_line_4,
address_physical_line_5,
address_physical_line_6,
address_physical_line_7,
address_physical_code,
address_postal_line_1,
address_postal_line_2,
address_postal_line_3,
address_postal_line_4,
address_postal_line_5,
address_postal_line_6,
address_postal_line_7,
address_postal_code,
fax_1,
fax_2,
phone_1,
phone_2,
phone_3,
phone_4)
VALUES ( (SELECT `id`
FROM firm
WHERE name = '$name'),
'$address_physical_line_1',
'$address_physical_line_2',
'$address_physical_line_3',
'$address_physical_line_4',
'$address_physical_line_5',
'$address_physical_line_6',
'$address_physical_line_7',
'$address_physical_code',
'$address_postal_line_1',
'$address_postal_line_2',
'$address_postal_line_3',
'$address_postal_line_4',
'$address_postal_line_5',
'$address_postal_line_6',
'$address_postal_line_7',
'$address_postal_code',
'$fax_1',
'$fax_2',
'$phone_1',
'$phone_2',
'$phone_3',
'$phone_4') ";
Do i have to use transactional statement to run these three queries. I have never worked with transactional statements. The one statement is depending on values of the other ones.
MySQL has AUTO_COMMIT set to true by default. This means that every query in your script will be executed before the one right after.
This allows you to do something like :
// Here I admit that the table is empty, with an auto-incremented id.
INSERT INTO test VALUES ('', 'First');
INSERT INTO test ('', SELECT value FROM test WHERE id = "1");
Here, you will insert a first row with id=1, value="First" and then id=2, value="First".
I'm not sure to really understand your question but if you need to perform several SQL requests guaranteeing they are either all done or none done, you have to explicitly create and commit a transaction:
START TRANSACTION
INSERT ...
INSERT ...
INSERT ...
-- All is ready, apply "all at once"
COMMIT
http://dev.mysql.com/doc/refman/5.0/en/commit.html
Just to be clear (?), from inside your transaction, all the SQL statements appear to be executed "one by one as usual". But from outside world (other transactions/connections to you SQL server) no change would appear until you COMMIT your transaction -- and then all changes will appear "all at once".

Cannot insert data into mysql table via php

I'm sure this question has been asked a thousand times but after an hour of truly trying many examples on the web, I have failed to insert new data into my table. I have tried many methods as I said, the one I'm about to post is most recent. If anyone knows why my code is failing it would save so much stress. I have only so far managed to insert data via phpmyadmin. The database is called "test" and the table is called "getting". Please note that "key" is auto incremented.
Thank you
$username='****';
$password='****';
$database='test';
$con= mysql_connect("localhost",$username,$password);
mysql_select_db("test",$con);
mysql_query("INSERT INTO getting (Key, Date, amount, tax, Extra)
VALUES ('', 'sept 26 2008', '35653', '46', '454')");
You should try
put keywords between backticks
format date as YYYY-MM-DD
don't use quotes for numbers
use NULL for auto-increment keys (you could also remove it from INSERT)
perform error checking
Try this query
$res = mysql_query(
"INSERT INTO getting (`Key`, `Date`, amount, tax, Extra)
VALUES (NULL, '2008-09-26', 35653, 46, 454)");
if (!$res) {
die('Invalid query: ' . mysql_error());
} else {
// Do here what you need
}
mysql_query("INSERT INTO getting (Key, Date, amount, tax, Extra)
VALUES ('', 'sept 26 2008', '35653', '46', '454')") or die(mysql_error());
What does it say after execution? If there is an error in request - you will see it.
my guess would have been "4. use NULL for auto-increment keys" by marco too
I belive if the 'key' filed is autoincremented you may not even bother mentioning it in your insert statement.
Something like this
INSERT INTO getting (Date, amount, tax, Extra)
VALUES ('sept 26 2008', '35653', '46', '454')

Categories