Parse Large XML File in PHP Efficiently to Generate SQL - php

I am trying to parsing a large XML file and load it into MySQL. I have used simplexml to parse it, and it works perfectly, but its way to slow for this large XML file. Now i am trying to use XMLReader.
Here is the sample of the XML:
<?xml version="1.0" encoding="UTF-8"?>
<drug type="biotech" created="2005-06-13" updated="2015-02-23">
<drugbank-id primary="true">DB00001</drugbank-id>
<drugbank-id>BIOD00024</drugbank-id>
<drugbank-id>BTD00024</drugbank-id>
<name>Lepirudin</name>
<description>Lepirudin is identical </description>
<cas-number>120993-53-5</cas-number>
<groups>
<group>approved</group>
</groups>
<pathways>
<pathway>
<smpdb-id>SMP00278</smpdb-id>
<name>Lepirudin Action Pathway</name>
<drugs>
<drug>
<drugbank-id>DB00001</drugbank-id>
<name>Lepirudin</name>
</drug>
<drug>
<drugbank-id>DB01373</drugbank-id>
<name>Calcium</name>
</drug>
</drugs>
...
</drug>
<drug type="biotech" created="2005-06-15" updated="2015-02-25">
...
</drug>
Here is my approach using simplexml:
<?php
$xml = simplexml_load_file('drugbank.xml');
$servername = "localhost"; // Example : localhost
$username = "root";
$password = "pass";
$dbname = "dbname";
// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
$xmlObject_count = $xml->drug->count();
for ($i=0; $i < $xmlObject_count; $i++) {
$name = $xml->drug[$i]->name;
$description = $xml->drug[$i]->description;
$casnumber = $xml->drug[$i]->{'cas-number'};
// ...
$created = $xml->drug[$i]['created'];
$updated = $xml->drug[$i]['updated'];
$type = $xml->drug[$i]['type'];
$sql = "INSERT INTO `drug` (name, description,cas_number,created,updated,type)
VALUES ('$name', '$description','$casnumber','$created','$updated','$type')";
if ($conn->query($sql) === TRUE) {
$last_id = $conn->insert_id;
} else {
echo "outer else Error: " . $sql . "<br>" . $conn->error. "<br>" ;
}
}
$conn->close();
It works okay and it gives me 7,789 rows. But, I want to use XMLReader to parse this. But the problem with XMLReader I am finding it give more than 35,000 rows.
If you look at the XML you can see that inside the <drug /> nodes there are also some other <drugs><drug> child nodes. How can I overcome this?
Here is my procedure with XMLReader:
<?php
$servername = "localhost"; // Example : localhost
$username = "root";
$password = "pass";
$dbname = "dbname";
// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
$reader = new XMLReader();
$reader->open('drugbank.xml');
while ($reader->read())
{
if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'drug')
{
$doc = new DOMDocument('1.0', 'UTF-8');
$xml = simplexml_import_dom($doc->importNode($reader->expand(),true));
$name = $xml->name;
$description = $xml->description;
$casnumber = $xml->{'cas-number'};
// ...
$sql = "INSERT INTO `drug` (name, description,cas_number,created,updated,type)
VALUES ('$name', '$description','$casnumber','$created','$updated','$type')";
if ($conn->query($sql) === TRUE) {
$last_id = $conn->insert_id;
} else {
echo "outer else Error: " . $sql . "<br>" . $conn->error. "<br>" ;
}
}
}
$conn->close();
With this example, I am finding it give more than 35,000 rows.

Alright, I have a working example for you with much improvement in execution speed, memory usage, and database load:
<?php
define('INSERT_BATCH_SIZE', 500);
define('DRUG_XML_FILE', 'drugbank.xml');
$servername = "localhost"; // Example : localhost
$username = "root";
$password = "pass";
$dbname = "dbname";
function parseXml($mysql)
{
$drugs = array();
$xmlReader = new XMLReader();
$xmlReader->open(DRUG_XML_FILE);
// Move our pointer to the first <drug /> element.
while ($xmlReader->read() && $xmlReader->name !== 'drug') ;
$drugCount = 0;
$totalDrugs = 0;
// Iterate over the outer <drug /> elements.
while ($xmlReader->name == 'drug')
{
// Convert the node into a SimpleXMLElement for ease of use.
$item = new SimpleXMLElement($xmlReader->readOuterXML());
$name = $item->name;
$description = $item->description;
$casNumber = $item->{'cas-number'};
$created = $item['created'];
$updated = $item['updated'];
$type = $item['type'];
$drugs[] = "('$name', '$description','$casNumber','$created','$updated','$type')";
$drugCount++;
$totalDrugs++;
// Once we've reached the desired batch size, insert the batch and reset the counter.
if ($drugCount >= INSERT_BATCH_SIZE)
{
batchInsertDrugs($mysql, $drugs);
$drugCount = 0;
}
// Go to next <drug />.
$xmlReader->next('drug');
}
$xmlReader->close();
// Insert the leftovers from the last batch.
batchInsertDrugs($mysql, $drugs);
echo "Inserted $totalDrugs total drugs.";
}
function batchInsertDrugs($mysql, &$drugs)
{
// Generate a batched INSERT statement.
$statement = "INSERT INTO `drug` (name, description, cas_number, created, updated, type) VALUES";
$statement = $statement . ' ' . implode(",\n", $drugs);
echo $statement, "\n";
// Run the batch INSERT.
if ($mysql->query($statement))
{
echo "Inserted " . count($drugs) . " drugs.";
}
else
{
echo "INSERT Error: " . $statement . "<br>" . $mysql->error. "<br>" ;
}
// Clear the buffer.
$drugs = array();
}
// Create MySQL connection.
$mysql = new mysqli($servername, $username, $password, $dbname);
if ($mysql->connect_error)
{
die("Connection failed: " . $mysql->connect_error);
}
parseXml($mysql);
I tested this example using the same dataset.
Using SimpleXML in the way that you are leads to parsing the entire document in memory, which is slow and memory-intensive. This approach uses XMLReader, which is a fast pull-parser. You can probably make this faster still using the PHP SAX XML Parser, but it's a bit more complex of a pattern, and the above example will be noticeably better than what you started with.
The other significant change in my example is that we're using MySQL Batched Inserts, so we only actually hit the database every 500 (configurable) items we process. You can tweak this number for better performance. After a certain point, the query will become too large for MySQL to process, but you may be able to do a lot more than 500 at one time.
If you'd like me to explain any part of this further, or if you have any problems with it, just let me know in the comments! :)

Related

Importing CSV into MYSQL using PHP not working

I am trying to import a CSV file into MySQL I keep getting an error on Line 4 I know this has to be a simple problem but I have never done this before...
Here is the code I have.
<?php
$file = fopen("http://*******.com/*****/********.csv", "r");
$servername = "*******";
$username = "*************";
$password = "**************";
$dbname = "*********";
// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error){
die("Connection failed: " . $conn->connect_error);
}
while (($line = fgetcsv($file, 0, ",")) !== FALSE){
echo $sql = "INSERT INTO inventory_two (
`Add_City_State_Zip`,
`CommentsInstalledOptions`,
`TodaysDate`,
`total_Acquired_Service`,
`total_Acquired_Service_PakFee`,
`total_Service`,
`year_make_model_trim`,
`AcquiredDate`,
`AcquiredPrice`,
`ACV`,
`AdditionalNotes`,
`AltVehicleLocation`,
`ASIS`,
`AskDown`,
`Askprice`,
`AskPrice_Low`,
`AskTerm`,
`AutoBoingURL`,
`BodyStyle`,
`BodyStyle_Ebay`,
`ClearTitle`,
`CommentOptionsFuelStereo`,
`Comments`,
`Condition_ID`,
`ConditionDesc`,
`DaysInInventory`,
`DriveType`,
`Engine`,
`ESN`,
`ExtColor`,
`ExtTrim`,
`Flags`,
`FuelType`,
`Images`,
`Images_Primary`,
`ImageUpdate_Epoc`,
`Inspected`,
`Inst_Add1`,
`Inst_City`,
`Inst_DBA`,
`Inst_Email`,
`Inst_ID`,
`Inst_IDLot_ID`,
`Inst_Name`,
`Inst_Phone1`,
`Inst_State`,
`Inst_Website`,
`Inst_Zip`,
`InstalledOptions`,
`IntColor`,
`Inventory_ID`,
`InvType_ID`,
`InvTypeDesc`,
`Lot_Add1`,
`Lot_Add2`,
`Lot_City`,
`Lot_Email`,
`Lot_ID`,
`Lot_Phone1`,
`Lot_Phone2`,
`Lot_State`,
`Lot_Zip`,
`LotLegalName`,
`LotLocation`,
`Make`,
`Mileage`,
`MileageStatus_ID`,
`Model`,
`NewFlag`,
`PakFee`,
`Provider_ID`,
`ProviderCode`,
`SellerNotes`,
`Status`,
`Stereo`,
`StockNumber`,
`TitleLocation`,
`TitleStatus_ID`,
`TotalExpenses`,
`Transmission`,
`Transmission_Common`,
`UserDefined1`,
`VIN`,
`WarrantyTerms`,
`Weight`,
`WholeSalePrice`,
`Year`
)
VALUES ($line[0]','$line[1]','$line[2]','$line[3]','$line[4]','$line[5]','$line[6]','$line[7]','$line[8]','$line[9]','$line[10]','$line[11]','$line[12]','$line[13]','$line[14]','$line[15]','$line[16]','$line[17]','$line[18]','$line[19]','$line[20]','$line[21]','$line[22]','$line[23]','$line[24]','$line[25]','$line[26]','$line[27]','$line[28]','$line[29]','$line[30]','$line[31]','$line[32]','$line[33]','$line[34]','$line[35]','$line[36]','$line[37]','$line[38]','$line[39]','$line[40]','$line[41]','$line[42]','$line[43]','$line[44]','$line[45]','$line[46]','$line[47]','$line[48]','$line[49]','$line[50]','$line[51]','$line[52]','$line[53]','$line[54]','$line[55]','$line[56]','$line[57]','$line[58]','$line[59]','$line[60]','$line[61]','$line[62]','$line[63]','$line[64]','$line[65]','$line[66]','$line[67]','$line[68]','$line[69]','$line[70]','$line[71]','$line[72]','$line[73]','$line[74]','$line[75]','$line[76]','$line[77]','$line[78]','$line[79]','$line[80]','$line[81]','$line[82]','$line[83]','$line[84]','$line[85]','$line[86])";
if ($conn->query($sql) === TRUE){
echo "New record created successfully";
}
else{
echo "Error: " . $sql . "<br />" . $conn->error;
}
}
fclose($file);
?>
If I Echo the $line[] items in the while loop everything echos right. So I don't understand why this does not work?

Why is my json data not being INSERTed into mySQL database?

I created a PHP handler to receive a JSON payload from a POST request and then insert it into a database in phpMyAdmin. I'm not sure why this is not working.
JSON:
payload = {
"version":"1.0",
"event":"video_recorded",
"data":{
"videoName":"vs1457013120534_862",
"audioCodec":"NellyMoser ASAO",
"videoCodec":"H.264",
"type":"FLV",
"orientation":"landscape",
"id":"0",
"dateTime":"2016-03-03 15:51:44",
"timeZone":"Europe/Bucharest",
"payload":"111.111.111.11",
"httpReferer":"http://site_from_where_video_was_recorded.com"
}
}
The PHP code I got from a tutorial online. The tutorial was from 2017 so I'm assuming everything is up to date, but yet it still does not work:
<?php
/* db variables */
$dbhost = 'localhost';
$dbname = 'name_db';
$dbuser = 'user_db';
$dbpass = 'pass_db';
/* grab the json */
$data = $_POST['payload'];
/* put json into php associative array */
$data_array = json_decode($data);
/* store in PHP variables */
$ip_address = $data_array['data']['payload'];
$vid_name = $data_array['data']['videoName'];
$date_time = $data_array['data']['dateTime'];
$time_zone = $data_array['data']['timeZone'];
/* connect to mysql db */
$con = mysql_connect($dbuser, $dbpass, $dbhost) or die('Could not connect: ' . mysql_error());
/* select the specific db */
mysql_select_db($dbname, $con);
/* insert the values into the db */
$sql = "INSERT INTO ip_and_videos(IpAddress, VideoName, DateTime, Timezone) VALUES('$ip_address','$vid_name','$date_time','$time_zone')";
if(!mysql_query($sql,$con))
{
die('Error : ' . mysql_error());
}
?>
I have the primary key set to an int and have it on auto increment. If I understand correctly I don't need to insert anything into that column because it will assign a number each time. Or do I still need to pass it when I INSERT the other variables?
This works for the array part, you get correct answer. So, your code is not bad, but you should check all errors (as stated by Bhavin in comment). And I'm retty sure you have a typo -> $vid_name = $data_array['data']['videName']; is NOT like $vid_name = $data_array['data']['videoName']; Thereforer, error_reporting will be very helpful, and after that, check the query if other errors (prepared statements ^^)
<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
$payload = '{
"version":"1.0",
"event":"video_recorded",
"data": {
"videoName":"vs1457013120534_862",
"audioCodec":"NellyMoser ASAO",
"videoCodec":"H.264",
"type":"FLV",
"orientation":"landscape",
"id":"0",
"dateTime":"2016-03-03 15:51:44",
"timeZone":"Europe/Bucharest",
"payload":"111.111.111.11",
"httpReferer":"http://site_from_where_video_was_recorded.com"
}
}';
$data_array = json_decode($payload, true);
/* store in PHP variables */
$ip_address = $data_array['data']['payload'];
$vid_name = $data_array['data']['videoName'];
$date_time = $data_array['data']['dateTime'];
$time_zone = $data_array['data']['timeZone'];
echo"[ $ip_address / $vid_name / $date_time / $time_zone ]";
// EDIT : added query
include"config.inc.php";
// connect to DB
$mysqli = mysqli_connect("$host", "$user", "$mdp", "$db");
if (mysqli_connect_errno()) { echo "Error connecting : " . mysqli_connect_error($mysqli); }
$query = " INSERT INTO ip_and_videos (`IpAddress`, `VideoName`, `DateTime`, `Timezone`) VALUES (?,?,?,?) ";
$stmt = $mysqli->prepare($query);
print_r($stmt->error_list);
$stmt->bind_param("ssss", $ip_address, $vid_name, $date_time, $time_zone );
if (!$stmt->execute()) { echo $stmt->error; } else { echo"true"; }
?>
Not sure if you had the same issue but this does not work for me...
$vid_name = $data_array['data']['videName'];
I had to use
$vid_name = $data_array->data->videoName;
Can't use stdClass as an array.
Should be better this way
<?php
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
error_reporting(E_ALL);
/* db variables */
$dbhost = 'localhost';
$dbname = 'name_db';
$dbuser = 'user_db';
$dbpass = 'pass_db';
/* grab the json */
$data = $_POST['payload'];
/* put json into php associative array */
$alldata = json_decode($data,true);
print_r($alldata);
/* connect to mysql db */
$con = mysql_connect($dbuser, $dbpass, $dbhost) or die('Could not connect: ' . mysql_error());
/* select the specific db */
mysql_select_db($dbname, $con);
/* insert the values into the db */
foreach($alldata as $data_array) {
$ip_address = $data_array['data']['payload'];
$vid_name = $data_array['data']['videoName'];
$date_time = $data_array['data']['dateTime'];
$time_zone = $data_array['data']['timeZone'];
$sql = "INSERT INTO ip_and_videos(IpAddress, VideoName, DateTime, Timezone) VALUES('".$ip_address."','".$vid_name."','".$date_time."','".$time_zone."')";
mysql_query($sql);
echo mysql_errno($con) . ": " . mysql_error($con) . "\n";
}
?>
Hope this helps

PHP > Invalid Argument supplied for foreach()

In short, I am trying to figure out what is wrong with my foreach statement. I have been trying to work on finding the error for over a day know and I'm running out of time. This program is supposed to parse a json array and post it up to a mysqli database.
<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);
$a = print_r(var_dump($GLOBALS),1);
echo htmlspecialchars($a);
$servername = "#";
$username = "#";
$password = "#";
$dbname = "#";
// Create connection
$conn = mysqli_connect($servername, $username, $password, $dbname);
echo "Connection Successful : ";
// Check connection
if (!$conn) {
die("Connection failed: " . mysqli_connect_error());
}
// Read JSON file
$jsondata = file_get_contents('scripts/AUDIT_DIR/report.json');
echo "JSON File Read : ";
// Convert and Loop
$item = json_decode($jsondata, true);
echo "JSON File Decoded : ";
foreach($item as $arr)
{
$id = $arr["id"];
$hostname = $arr["hostname"];
$ip = $arr["ip"];
$package = $arr["package"];
$publisher = $arr["publisher"];
$origin = $arr["origin"];
$version = $arr["version"];
$size = $arr["size"];
$sql = "INSERT INTO testtable(id, hostname, ip, package, publisher, origin, version, size)
VALUES ('10', '$hostname', '$ip', '$package', '$publisher', '$origin', '$version', '$size')";
if (mysqli_query($conn, $sql))
{
echo "New record created successfully : ";
}
else
{
echo "Error: " . $sql . "<br>" . mysqli_error($conn);
}
}
?>
You likely have an invalid return from your json_decode() you can check this with a var_dump($item); after your json_decode()
In php json_decode() will return NULL if the json cannot be decoded or if the encoded data is deeper than the recursion limit. http://php.net/manual/en/function.json-decode.php
You need to properly guard for such a case that $item === null and not assume you will always get a valid return for your foreach() params.
Example showing your error happens when $item = null
https://3v4l.org/oNr8P

Why isn't this sending to my mysql server?

So, everytime I go to http://localhost/api/calls.php?gamename=test&gameowner=hi&gameownerid=1&placeid=2&serverjobid=hi&serverid=jaja&serverplayers=1&sendername=bob&senderid=3&senderage=14&senderwarnings=0&calltype=non&reportinfo=hi&suspect=none
it shows absolutely nothing and doesn't send the data to my mysql database.
Here is my code. I removed my mysql info just to be safe.
<?php
$servername = "";
$username = "";
$password = "";
$database = "";
// Establish MySQL Connection
$conn = new mysqli($servername, $username, $password, $database);
// Check connection
if ($conn->connect_error) {
die("MySafeServer Database Connection Failed: " . $conn->connect_error);
}
if (array_key_exists('param',$_GET)) {
$gamename = $_GET['param'];
$gameowner = $_GET['param'];
$gameownerid = $_GET['param'];
$placeid = $_GET['param'];
$serverjobid = $_GET['param'];
$serverid = $_GET['param'];
$serverplayers = $_GET['param'];
$sendername = $_GET['param'];
$senderid = $_GET['param'];
$senderage = $_GET['param'];
$senderwarnings = $_GET['param'];
$calltype = $_GET['param'];
$reportinfo = $_GET['param'];
$suspect = $_GET['suspect'];
mysql_query("INSERT INTO mss_calls3 (gamename, gameowner, gameownerid, placeid, serverjobid, serverid, serverplayers, sendername, senderid, senderage, senderwarnings, calltype, reportinfo, suspect) VALUES ($gamename, $gameowner, $gameownerid, $placeid, $serverjobid, $serverid, $serverplayers, $sendername, $senderid, $senderage, $senderwarnings, $calltype, $reportinfo, $suspect)");
};
?>
#Mark is right, you should stick to using the mysqli functions only.
As #andrewsi says, since you're not querying data, there's nothing in your code that prints whether the insert statement is a success, but only on failure, so I added a "success!" echo. You will still want to query the database to see if the values were inserted.
#Matt and #Mark's points about preparing statements are crucial to sanitizing your input - this is security 101, and you should do some googling on it.
But ultimately, I think #CodeGodie hit on your biggest problem to just getting it working. You assign all your variables to the same value with $_GET['param'] except for "suspect" at the very end. And from the link you posted in the question, there is no "param" in your query string. I'm not entirely sure what you were going for, but I'm assuming you wanted to match the parameter name with the variable name. I don't think it works that way, but the following untested code should get you going:
<?php
$params = array(
"gamename",
"gameowner",
"gameownerid",
"placeid",
"serverjobid",
"serverid",
"serverplayers",
"sendername",
"senderid",
"senderage",
"senderwarnings",
"calltype",
"reportinfo",
"suspect"
);
$cols = "";
$vals = "";
$binding_type = "";
$get_params = array();
// first pass to build the query,
// and validate inputs exist
for ($params as $param) {
if ( isset($_GET["$param"]) ) {
$cols .= "$param,";
$vals .= "?,";
$get_params []= $_GET["$param"];
// determine the binding type as either integer or string
if (is_numeric($_GET["$param"]))
$binding_type .= "i";
else
$binding_type .= "s";
} else die("$param is not set");
}
// trim trailing commas
$cols = rtrim($cols, ",");
$vals = rtrim($vals, ",");
$sql = "INSERT INTO mss_calls3 ($cols) VALUES ($vals);";
$servername = "";
$username = "";
$password = "";
$database = "";
// Establish MySQL Connection
$conn = new mysqli($servername, $username, $password, $database);
// Check connection
if ($conn->connect_error) {
die("MySafeServer Database Connection Failed: " . $conn->connect_error);
}
// prepare statement
$stmt = $conn->prepare($sql) or die($conn->error);
// bind parameters
// watch this is the tricky dynamic part I got help from the following, but may need some work:
// http://stackoverflow.com/questions/627763/php-and-mysqli-bind-parameters-using-loop-and-store-in-array
// http://no2.php.net/manual/en/mysqli-stmt.bind-param.php#89171
call_user_func_array( array($stmt, 'bind_param'), array_merge(array($stmt, $binding_type), $get_params));
// execute
if( $stmt->execute() )
echo "success!";
else
echo $stmt->error;
$stmt->close();
$conn->close();
?>

PHP randomly printing code within php tags

I wrote some PHP/sql with the intention of storing session variables into a SQL table. (I left out some html that tells the user that the php worked.)
<?php
session_start();
$name = $_REQUEST["name"];
$type = $_REQUEST["type"];
$lengthnum = $_REQUEST["lengthnum"];
$rewardnum = $_REQUEST["rewardnum"];
$itemreward = $_REQUEST["itemreward"];
$dsn = "mysql:host=localhost;dbname=xxxxxx";
$username = "xxxxxxxxx";
$pw = "xxxxxxxx";
$options = array(PDO ::ATTR_ERRMODE=>PDO ::ERRMODE_EXCEPTION);
try
{
$my_pdo = new PDO ($dsn, $username, $pw, $options);
$sql_stmt = "INSERT INTO xxxxxx (Name, Type, Length, Reward, Item)
VALUES ($name, $type, $lengthnum, $rewardnum, $itemreward)";
$my_pdo->query($sql_stmt);
}
catch(Exception $a)
{
echo "<p>Error..." . $a->getMessage() . "</p>";
}
?>
For some reason this code "breaks" out of the php tags after "$options = array(" and this is output to the html file.
PDO ::ERRMODE_EXCEPTION); try { $my_pdo = new PDO ($dsn, $username, $pw, $options); $sql_stmt = "INSERT INTO simpleWFA (Name, Type, Length, Reward, Item) VALUES ($name, $type, $lengthnum, $rewardnum, $itemreward)"; $my_pdo->query($sql_stmt); } catch(Exception $a) { echo "
Error..." . $a->getMessage() . "
"; } ?>
Thanks!
Just a guess but I'd say your array definition is invalid.
$options = array(PDO ::ATTR_ERRMODE=>PDO ::ERRMODE_EXCEPTION);
should be something like
$options = array("PDO::ATTR_ERRMODE"=>"PDO::ERRMODE_EXCEPTION");

Categories