Optimize Import CSV data into neo4j

Optimize Import CSV data into neo4j - php

I wrote an import snippet to populate my Neo4J DB with nodes for towns and related to them counties. The code looks like
<?php
function readCSV($csvFile){
$file_handle = fopen($csvFile, 'r');
$lineCount=0;
while (!feof($file_handle) ) {
$line_of_text[] = fgetcsv($file_handle, 1024, ';', '"');
$lineCount++;
}
fclose($file_handle);
return array($line_of_text,$lineCount);
}
// Create an Index for Town and for Country
$queryString = '
CREATE INDEX ON :Country (name)
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString);
$result = $query->getResultSet();
$queryString = '
CREATE INDEX ON :Town (name)
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString);
$result = $query->getResultSet();
// Set path to CSV file
$importFile = 'files/import_city_country.csv';
$completeResult = readCSV($importFile);
$dataFile = $completeResult[0];
$maxLines = $completeResult[1];
for ($row = 1; $row < $maxLines; ++ $row) {
$countryData = array();
if(!is_null($dataFile[$row][0]))
{
// Define parameters for the queries
$params =array(
"nameCountry" => trim($dataFile[$row][0]),
"nameTown" => trim($dataFile[$row][1]),
"uuid" => uniqid(),
);
# Now check if we know that country already to avoid double entries
$queryString = '
MATCH (c:Country {name: {nameCountry}})
RETURN c
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString,$params);
$result = $query->getResultSet();
if(COUNT($result)==0) // Country doesnt exist!
{
$queryString = '
MERGE (c:Country {name: {nameCountry}} )
set
c.uuid = {uuid},
c.created = timestamp()
RETURN c
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString,$params);
$result = $query->getResultSet();
}
# Now check if we know that town already
$queryString = '
MATCH (t:Town {name: {nameTown}})
RETURN t
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString,$params);
$result = $query->getResultSet();
if(COUNT($result)==0) // Town doesnt exist!
{
$queryString = '
MERGE (t:Town {name: {nameTown}} )
set
t.created = timestamp()
RETURN t
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString,$params);
$result = $query->getResultSet();
// Relate town to country
$queryString = '
MATCH (c:Country {name: {nameCountry}}), (t:Town {name: {nameTown}})
MERGE (t)-[:BELONGS_TO]->(c);
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString,$params);
$result = $query->getResultSet();
}
} // Excel Last Line is not Null - go on
} // Next Row
?>
A typical CSV line looks like
Country City
Albania Tirana
This all works fine - but it takes more than 30 minutes on a pc to import 9.000 lines. I know the system needs to check each record if already existing and also do a relation between town and country but it seems quite long though for such an amount of CSV-lines.
Do you have maybe suggestions how to improve the import code?
Thanks,
Balael
BTW: Any chance to insert here code without editing every row and adding 4 spaces - kinda boring for longer code.....

Use LOAD CSV inside of the neo4j-shell if at all possible, and don't write your own code to process the CSV.
What this will do for you primarily is to allow you to batch many items into a single transaction USING PERIODIC COMMIT.
If you want to use the REST API remotely (as I assume you're doing here) then look into your language binding's support for batch operations. Your code as written is going to spend a lot of time going back and forth to the server, probably turning each line of the CSV into a request/response, which will be slow. Better to batch up many at a time and run them as one operation, which will help minimize how much protocol overhead you have.

Related

How to duplicate too many records with PHP?

I need to read and process (for example add a "2" at the end of IDs + some other things...) and re-insert at least 2000 records form MySQL with PHP.
It's not possible to do it with a SQL Query...
The problem is when we click on the Go button, it processes almost 500 records but then we see a "Server Internal 500" error! but in localhost we don't have any problem...
Is there a way to do this with these limited resources in our customer websites?
Another question: What causes this problem? What resource needs to be more? RAM? CPU?...?
Here is the code:
(We should read all courses of a semester and all course selections and copy them to the new semester)
foreach ($courseList as $courseInfo)
{
$ccode=$courseInfo['ccode'];
$lid=$courseInfo['lid'];
$pid=$courseInfo['pid'];
$clid=$courseInfo['clid'];
$cgender=$courseInfo['cgender'];
$descriptive_score=$courseInfo['descriptive_score'];
$final_capacity=$courseInfo['final_capacity'];
$days_times=$courseInfo['days_times'];
$exam_date=$courseInfo['exam_date'];
$exam_place=$courseInfo['exam_place'];
$ccourse_start_date=date("Y-m-d H:i:s");
$ccomment=$courseInfo['ccomment'];
$cid=$courseInfo['cid'];
$majors=$course->GetCourseMajorsList($cid);
$scList=$course->GetCourseScoreColumnsList($cid);
$courseScoreColums=array();
foreach ($scList as $scProp)
{
$courseScoreColums[$scProp['scid']]=$scProp['scid'].','.$scProp['cscfactor'];
}
$tid = $term->LastTermID();
$counts = $course->AddCourse($ccode.'2',$tid,$lid,$pid,$clid,$majors,$courseScoreColums,$cgender,$descriptive_score,$final_capacity,$days_times,NULL,$exam_place,$ccourse_start_date,$ccomment,$aid);
if ($counts==1)
{
$new_cid = $course->LastCourseID();
$cs = new ManageCourseStudents();
$query = " WHERE `".$table_prefix."courses`.`cid`=$cid ";
$courseStudentList = $cs->GetCourseStudentList($query,'');
foreach ($courseStudentList as $csInfo)
$cs->AddCourseStudent($csInfo['uid'],$new_cid,$csInfo['lvid'],$aid);
}
}

$str = "select * from tableName"; //replacing * by table columns is a good practice
$result = $conn->query($str);
while($arr = $result->fetch_array(MYSQLI_ASSOC);)
{
$strr1 = " INSERT INTO tableName ( `id` , `title` , `des` ) //add more
VALUES ( '".$arr['id']."2', '".$arr['something']."' //add as you require
";
if($conn->query($strr1) === false) {
trigger_error('Wrong SQL: ' . $sql . ' Error: ' . $conn->error, E_USER_ERROR);
}
}
this could duplicate every data in your database

Very slow PHP - MySQL script

I am new at using PHP-MySQL. I have two MySQL tables:
Concreteness: A table that contains concreteness scores for 80K words
Brian: A table with 1 million rows, each containing one or two words.
I have a small PHP script that takes each row in "Brian", parses it, looks for the scores in "Concreteness" and records it in "Brian."
I have been running this script with several other tables that had 300-400k rows with each hundreds of words. "Brian" is different because it has 1 million rows with 1 or 2 words per row. For some reason, my script is SUPER slow with Brian.
Here is the actual script:
<?php
include "functions.php";
set_time_limit(0); // NOTE: no time limit
if (!$conn)
die('Not connected : ' . mysql_error());
$remove = array('{J}','{/J}','{N}','{/N}','{V}','{/V}','{RB}','{/RB}'); // tags to remove
$db = 'LCM';
mysql_select_db($db);
$resultconcreteness = mysql_query('SELECT `word`, `score` FROM `concreteness`') or die(mysql_error());
$array = array(); // NOTE: init score cache
while($row = mysql_fetch_assoc($resultconcreteness))
$array[strtolower($row['word'])] = $row['score']; // NOTE: php array as hashmap
mysql_free_result($resultconcreteness);
$data = mysql_query('SELECT `key`, `tagged` FROM `brian`') or die(mysql_error()); // NOTE: single query instead of multiple
while ($row = mysql_fetch_assoc($data)) {
$key = $row['key'];
$tagged = $row['tagged'];
$weight = $count = 0;
$speech = explode(' ', $tagged);
foreach ($speech as $word) {
if (preg_match('/({V}|{J}|{N}|{RB})/', $word, $matches)) {
$weight += $array[strtolower(str_replace($remove, '', $word))]; // NOTE: quick access to word's score
if(empty($array[strtolower(str_replace($remove, '', $word))])){}else{$count++;}
}
}
mysql_query('UPDATE `brian` SET `weight`='.$weight.', `count`='.$count.' WHERE `key`='.$key, $conn) or die(mysql_error());
// Print out the contents of the entry
Print "<b>Key:</b> ".$info['key'] . " <br>";
}
mysql_free_result($data);
?>

I guess the real problem is the 1 million mysql update statements you fire to the database. Consider bundling the update statements (and also remove the print):
$i=0;
while ($row = mysql_fetch_assoc($data)) {
// ... left out the obvious part
$sql .= "'UPDATE `brian` SET `weight`='.$weight.', `count`='.$count.' WHERE `key`='.$key;";
$i++;
if ($i%1000 == 0) {
mysql_query($sql) or die(mysql_error());
$i=0;
$sql = "";
}
}
// remember to save the last few updates
mysql_query($sql) or die(mysql_error());

CSV file generation issue on IE9

I'm looking at an odd issue with Internet Explorer 9. The below code successfully generates a populated CSV file when ran in Safari, IE8, IE10, IE11, Chrome and Firefox but generates an empty CSV file (Headers only) in IE9. It does not appear to be a data issue considering the bug only appears to happen in IE9. I've been able to test and confirm on 3 different machines running IE9.
It's a legacy script I inherited with SQLSRV instead of MSSQL PDO, though I took the liberty of scooping out mysql and replacing it with PDO and wrapping it in a function and thoroughly commenting it for my own sanity.
# Create a CSV sheet with details we need
# Generate file name based on timestamp
$file = "$_SESSION[user_name]_" . "tutorincomplete_" . date("Y-m-d_H-i-sa");
# 1: Open CSV File
$myFile = ".\csv\\" . "$file.csv";
$fh = fopen($myFile, 'w') or die("can't open file");
# Get list of active classes from MSSQL DB
$sql = "SELECT DISTINCT
static_title,
LEFT(CONVERT(VARCHAR, course_end_date, 120), 150) AS course_end_date,
RTRIM(static_code) as static_code,
RTRIM(session_code) as session_code,
staff_name
FROM
snip";
# Append to query where a school is present
if (ISSET($_SESSION['user_school']) AND $_SESSION['user_school'] != "None")
{
$sql .= "
WHERE
school = ?
ORDER BY staff_name ASC";
$params1 = array($_SESSION['user_school']);
$query = sqlsrv_query($conn, $sql, $params1);
}
else
{
if (isset($_POST['school']) AND $_POST['school'] != "All") {
$sql .= "
WHERE
school = ?
ORDER BY staff_name ASC";
$params1 = array($_POST['school']);
$query = sqlsrv_query($conn, $sql, $params1);
} else {
$sql .= "
ORDER BY staff_name ASC";
$query = sqlsrv_query($conn, $sql);
}
}
$today = date('U');
# Set headers for CSV file
$head = array("Static title", "Course end date", "Static code", "Session code", "Staff name");
fputcsv($fh, $head);
$count = 0;
# Loop through MSSQL results for comparison
while ($obj = sqlsrv_fetch_array($query, SQLSRV_FETCH_ASSOC))
{
$finish = $obj['course_end_date'];
# Strip the time (keep ISO date format intact)
$finishSub = substr($finish, 0, 10);
# Create timestamp from date format
$finishTs = strtotime($finishSub);
# Compare and display <option> tag if valid
if ($today > ($finishTs - 604800))
{
# Fetch information on course; Decide of course is complete or incomplete
# To do this, first check for course entries in the MySQL database
# If they are present, we will use MySQL instead of MSSQL (To reference completed courses)
$sql3 = "SELECT * FROM snip WHERE static = :stc AND session = :sec ORDER BY static ASC";
$query3 = $pdo->prepare($sql3);
$query3->bindParam(':stc', $obj['static_code']);
$query3->bindParam(':sec', $obj['session_code']);
$query3->execute();
$rowCount = $query3->rowCount();
# Check for result count in MySQL
if ($rowCount > 0)
{
$result = $query3->fetch(PDO::FETCH_OBJ);
if ($result->complete == 0) {
$incomplete = 1;
} else {
$incomplete = 0;
$count++;
}
} else {
# No MySQL entry; Mark as incomplete
$incomplete = 1;
}
if ($incomplete > 0)
{
$arr = array($obj['static_title'], $obj['course_end_date'], $obj['static_code'], $obj['session_code'], $obj['staff_name']);
fputcsv($fh, $arr);
}
}
}
echo "
<h1>CSV file generation complete</h1>
<p>Skipped $count classes</p>
<p>File generation complete. File name $file.csv Click here to open</p>";
}
In anything but IE9, it generates "Skipped X classes", CSV file has x number of entries.
But in IE9, "Skipped 0 classes", CSV file only contains column headers.
EDIT: I found out through some experimentation that SQLSRV is passing an empty array ONLY in IE9. It's not affecting any other browser. As such I was able to impress upon the powers that be that I need to switch the system to PDO for MSSQL. Guess I can call this fixed but unresolved.

Trouble creating array

I have a problem…
I am trying to create an array from a mysql table.
But I don’t know how to format the data coming out of MySQL into an array in php.
Here is what I have done so far...
//Generate Org Data
$result_org = mysql_query("SELECT emp_no,sup_empno,Name,Title FROM employees");
// Initializes a container array
$orgArray = array();
while($row = mysql_fetch_array($result_org, MYSQL_ASSOC))
{
$currempno = $row['emp_no'];
$currsupervisor = $row['sup_empno'];
$currtitle = $row['Name']. '\n ' .$row['Title'];
// This is where I haven't a clue to get it the right format...??
// Stores each database record to an array
$buildorg = array("$currempno","$currsupervisor","$currtitle");
// Adds each array into the container array
array_push($orgArray, $buildorg);
}
// show the data to verify
echo ($orgArray);
// the data needs to be exactly like this below
o.addNode(003, 002, '', 'Jane Doe\nAsst Manager');
where 003 is the $currempno 002 is the $currsupervisor Jane Doe\nAsst
Manager is $currtitle
getting the o.addNode( along with the commas and double quotes and ending );
around this has me perplexed
Any help would be appreciated…
K Driscoll

If I understand correctly, you are actually trying to create line
o.addNode(003, 002, '', 'Jane Doe\nAsst Manager');
so just put the values into the string and format it the way you want to - there is no need to create another array - you can create final strings and push those into the array:
$currempno = $row['emp_no'];
$currsupervisor = $row['sup_empno'];
$currtitle = $row['Name']. '\n ' .$row['Title'];
$output = sprintf("o.addNode(%03d, %03d, '', '%s');", $currempno, $currsupervisor, $currtitle);
array_push($orgArray, $output);

I guess this is what you are look for, more or less:
$finalArray = array();
$result_org = mysql_query("SELECT emp_no,sup_empno,Name,Title FROM employees");
while( $row = mysql_fetch_array($result_org, MYSQL_ASSOC) ) {
$finalArray[] = array(
$row['emp_no'],
$row['sup_empno'],
$row['Name'].'\n'.$row['Title']
);
}
print_r($finalArray);

You used array_push function, better use another construction: $someArray[] = $newElement; — it has to do same. Example below.
This code will work:
//Generate Org Data
$result_org = mysql_query("SELECT emp_no,sup_empno,Name,Title FROM employees");
// Initializes a container array
$orgArray = array();
while($row = mysql_fetch_array($result_org, MYSQL_ASSOC))
{
$currempno = $row['emp_no'];
$currsupervisor = $row['sup_empno'];
$currtitle = $row['Name']. '\n ' .$row['Title'];
$orgArray[] = array($currempno, $currsupervisor, $currtitle);
}
var_dump($orgArray);

Codeigniter: Get affected fields in update

There's a way to get which fields were modified after a update query?
I want to keep track what field XXX user modified... any ways using active records?

I needed this exact functionality so I wrote this code. It returns the number of fields that were affected.
FUNCTION STARTS:
function mysql_affected_fields($sql)
{
// Parse SQL update statement
$piece1 = explode( "UPDATE ", $sql);
$piece2 = explode( "SET", $piece1[1]);
$sql_parts['table'] = trim($piece2[0]);
$piece1 = explode( "SET ", $sql);
$piece2 = explode( "WHERE", $piece1[1]);
$sql_parts['set'] = trim($piece2[0]);
$fields = explode (",",$sql_parts['set']);
foreach($fields as $field)
{
$field_parts = explode("=",$field);
$field_name = trim($field_parts[0]) ;
$field_value = trim($field_parts[1]) ;
$field_value =str_replace("'","",$field_value);
$sql_parts['field'][$field_name] = $field_value;
}
$piece1 = explode( "WHERE ", $sql);
$piece2 = explode( ";", $piece1[1]);
$sql_parts['where'] = trim($piece2[0]);
// Get original field values
$select = "SELECT * FROM ".$sql_parts['table']." WHERE ".$sql_parts['where'];
$result_latest = mysql_query($select) or trigger_error(mysql_error());
while($row = mysql_fetch_array($result_latest,MYSQL_ASSOC))
{
foreach($row as $k=>$v)
{
if ($sql_parts['field'][$k] == $v)
{
}
else
{
$different++;
}
}
}
return $different;
}

There is no way using active record to get this easily, but if you are only supporting one specific database type (let's say MySQL) you could always use Triggers?
Or, Adam is about right. If you have a WHERE criteria for your UPDATE you can SELECT it before you do the UPDATE then loop through the old and new versions comparing.
This is exactly the sort of work Triggers were created for, but of course that puts too much reliance on the DB which makes this less portable yada yada yada.

solution
instructions:
SELECT row, that user wants to modify
UPDATE it
Compute differences between selected and update it
Store the differences somewhere (or mail it, show it, whatever)
simple

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Optimize Import CSV data into neo4j - php

Related

How to duplicate too many records with PHP?

Very slow PHP - MySQL script

CSV file generation issue on IE9

Trouble creating array

Codeigniter: Get affected fields in update

Categories

Resources