So I have a script that reads a text file, organizes it into an array then uses this code to loop through the data to input into the proper columns/rows inside MySQL server:
$size = sizeof($str)/14;
$x=0;
$a=0; $b=1; $c=2; $d=3; $e=4; $f=5; $g=6; $h=7; $i=8; $j=9; $k=10; $l=11; $m=12; $n=13;
mysql_query('TRUNCATE TABLE scores');
do {
$query = "INSERT INTO scores (serverid,resetid, rank,number,countryname,land,networth,tag,gov,gdi,protection,vacation,alive,deleted)
VALUES ('$str[$a]','$str[$b]','$str[$c]','$str[$d]','$str[$e]','$str[$f]','$str[$g]','$str[$h]',
'$str[$i]','$str[$j]','$str[$k]','$str[$l]','$str[$m]','$str[$n]')";
mysql_query($query,$conn);
$a=$a+14; $b=$b+14; $c=$c+14; $d=$d+14; $e=$e+14; $f=$f+14; $g=$g+14; $h=$h+14; $i=$i+14; $j=$j+14; $k=$k+14; $l=$l+14; $m=$m+14; $n=$n+14;
$x++;
} while ($x != $size);
mysql_close($conn);
This code figures out how large the file is loops through all 13 columns until it reaches the last row in the text file. Each time it is ran it clears the DB and loads the new data (as intended).
My question is: is this a good way of doing it? Or is there a faster more clean way to do the same thing as my code above?
Could I use the LOAD DATA LOCAL INFILE '$myFile'" . " INTO TABLE ranksfeed_temp FIELDS TERMINATED BY ',' to do the same job in a more efficient manner? What are your thoughts? I'm trying to make my code more efficient and fast.
LOAD DATA would be faster and more efficient to import a character separated file like csv. LOAD DATA is optimized for importing large files into you MySQL table, whereas you are running one query per row from your textfile, which ist incredibly slow in execution.
Please pay attention to the fact that the LOCAL option is only for files which are placed on the client side of your MySQL-Server-Client Connection. Try to load the file form the machine which acts as the MySQL directly.
Disabling possible keys on your table before inserting can give you extra speed while importing. Try it with disabled keys and without to benchmark the results.
Related
How can i import UTF-8 data form Movielens to MySql.
I get the data from http://grouplens.org/datasets/movielens/ and for my recommender system Thesis purpose, i just want the 100K and Tag Gnome data only.
I've been searching on google and in this forum and i don't find anything about importing these files to MySQl. Myself, currently using PhpMyAdmin for managing MySQL, so if anybody know how to easily import those files to MySQL.
I'm fine if you guys recommend me to iterate it one by one using php, but please explain to me the code.
You'll need to write some custom code to import all of their data into MySQL. Dumbest answer on Stack Overflow ever, right?
So they provide a set of flat files, each described in the README.
README
allbut.pl
mku.sh
u.data
u.genre
u.info
u.item
u.occupation
u.user
u1.base
u1.test
u2.base
u2.test
u3.base
u3.test
u4.base
u4.test
u5.base
u5.test
ua.base
ua.test
ub.base
ub.test
In a nutshell:
Make your own database and tables in MySQL.
Programatically open a file and parse each line to SQL.
Import the SQL into MySQL.
???
Profit!
Yeah, I know I still haven't really told you anything, let's do one and you can hopefully do the others.
I'll do u.genre, because I'm lazy and it is easy.
Make a new table, I'll assume you know how to make tables and such.
u.genre has two things: a genre and an id.
unknown|0
Action|1
...etc...
So your table should have two fields.
You'll use two data types: https://dev.mysql.com/doc/refman/5.7/en/data-types.html
id - unsigned TINYINT
TINYINT unsigned is 0 to 255
genre - VARCHAR(20)
VARCHAR 20 is up to 20 characters, their longest is "Documentary" so that'll give you a bit of extra room if they add a new one.
Open the file get the contents: https://secure.php.net/manual/en/function.file-get-contents.php
$filecontents = file_get_contents("u.genre");
Now let's split up the file by line: https://secure.php.net/manual/en/function.explode.php
$genres = explode("\n", $filecontents);
Now we'll loop through the $genres using foreach and explode again: https://secure.php.net/manual/en/control-structures.foreach.php
foreach ($genres as &$row) {
list($genre,$id) = explode("|",$row);
# more here later
}
Now let's just output SQL, skipping if either of the fields are empty.
if ($genre!="" && $id!=="") {
print "INSERT INTO genre (genre,id) VALUES ($genre,$id);\n";
}
Put it all together...
<?php
$filecontents = file_get_contents("u.genre");
$genres = explode("\n", $filecontents);
foreach ($genres as &$row) {
list($genre,$id) = explode("|",$row);
if ($genre!="" && $id!=="") {
$sql = "INSERT INTO genre (genre,id) VALUES ($genre,$id);\n";
print $sql;
# Insert each into your DB here.
}
}
?>
Save it and run it from the commandline or put it in a browser for no good reason.
There are too many resources out there showing how to insert data into MySQL, so I'll leave it at this. Everyone's database setup is a bit different, so writing it up for my particular setup won't help you.
I need to insert data from a plain text file, explode each line to 2 parts and then insert to the database. I'm doing in this way, But can this programme be optimized for speed ?
the file has around 27000 lines of entry
DB structure [unique key (ext,info)]
ext [varchar]
info [varchar]
code:
$string = file_get_contents('list.txt');
$file_list=explode("\n",$string);
$entry=0;
$db = new mysqli('localhost', 'root', '', 'file_type');
$sql = $db->prepare('INSERT INTO info (ext,info) VALUES(?, ?)');
$j=count($file_list);
for($i=0;$i<$j;$i++)
{
$data=explode(' ',$file_list[$i],2);
$sql->bind_param('ss', $data[0], $data[1]);
$sql->execute();
$entry++;
}
$sql->close();
echo $entry.' entry inserted !<hr>';
If you are sure that file contains unique pairs of ext/info, you can try to disable keys for import:
ALTER TABLE `info` DISABLE KEYS;
And after import:
ALTER TABLE `info` ENABLE KEYS;
This way unique index will be rebuild once for all records, not every time something is inserted.
To increase speed even more you should change format of this file to be CSV compatible and use mysql LOAD DATA to avoid parsing every line in php.
When there are multiple items to be inserted you usually put all data in a CSV file, create a temporary table with columns matching CSV, and then do a LOAD DATA [LOCAL] INFILE, and then move that data into destination table. But as I can see you don't need much additional processing, so you can even treat your input file as a CSV without any additional trouble.
$db->exec('CREATE TEMPORARY TABLE _tmp_info (ext VARCHAR(255), info VARCHAR(255))');
$db->exec("LOAD DATA LOCAL INFILE '{$filename}' INTO TABLE _tmp_info
FIELDS TERMINATED BY ' '
LINES TERMINATED BY '\n'"); // $filename = 'list.txt' in your case
$db->exec('INSERT INTO info (ext, info) SELECT t.ext, t.info FROM _tmp_info t');
You can run a COUNT(*) on temp table after that to show how many records were there.
If you have a large file that you want to read in I would not use file_get_contents. By using it you force the interpreter to store the entire contents in memory all at once, which is a bit wasteful.
The following is a snippet taken from here:
$file_handle = fopen("myfile", "r");
while (!feof($file_handle)) {
$line = fgets($file_handle);
echo $line;
}
fclose($file_handle);
This is different in that all you are keeping in memory from the file at a single instance in time is a single line (not the entire contents of the file), which in your case will probably lower the run-time memory footprint of your script. In your case, you can use the same loop to perform your INSERT operation.
If you can use something like Talend. It's an ETL program, simple and free (it has a paid version).
Here is the magic solution [3 seconds vs 240 seconds]
ALTER TABLE info DISABLE KEYS;
$db->autocommit(FALSE);
//insert
$db->commit();
ALTER TABLE info ENABLE KEYS;
I need to come up with a way to make a large task faster to beat the timeout.
I have very limited access to the server due to the restrictions of the hosting company.
I have a system set up where a cron visits a PHP file that grabs a csv that contains data on some products. The csv does not contain all of the fields that the product would have. Just a handful of essential ones.
I've read a fair number of articles on timeouts and handling csv's and currently (in an attempt to shave time) I have made a table (let's call it csv_data) to hold the csv data. I have a script that truncates the csv_data table then inserts data from the csv so each night the latest recordset from the csv is in that table (the csv file gets updated nightly). So far, no timeout problems..the task only takes about 4-5 seconds.
The timeouts occur when I have to sift through the data to make updates to the products table. The steps that it is running right now is like this
1. Get the sku from csv_data table (that holds thousands of records)
2. Select * from Products where products.sku = csv.sku (products table also holds thousands of records to loop through)
3. Get numrows.
If numrows<0{no record in products, so skip}.
If numrows>1{duplicate entries, don't change anything, but later on report the sku}
If numrows==1{Update selected fields in the products table with csv data}
4. Go to the next record in csv_data all over again
(I figured outlining the process is shorter and easier than dropping in the code.)
I looked into MySQl views and stored procedures but I am not skilled enough in it to know if it will handle the 'if' statement portion.
Is there anything I can do to make this faster to avoid the timeouts?
edit:
I should mention that set_time_limit(0); isn't doing it. And if it helps, the server uses IIS7 and fastcgi
Thanks for your help.
Update after using suggestions from Jakob and Shawn:
I'm doing something wrong. The speed is definitely faster and the csv sku is incrementing,
but when I tried to implement Shawn's solution; the query is giving me a PHP Warning: mysql_result() expects parameter 1 to be resource, boolean error.
Can you help me spot what I am doing wrong?
Here is the section of code:
$csvdata="SELECT * FROM csv_update";
$csvdata_result=mysql_query($csvdata);
mysql_query($csvdata);
$csvdata_num = mysql_num_rows($csvdata_result);
$i=0;
while($i<$csvdata_num){
$csv_code=#mysql_result($csvdata_result,$i,"skucode");
$datacheck=NULL;
$datacheck=substr($csv_code,0,1);
if($datacheck>='0' && $datacheck<='9'){
$csv_price=#mysql_result($csvdata_result,$i,"price");
$csv_retail=#mysql_result($csvdata_result,$i,"retail");
$csv_stock=#mysql_result($csvdata_result,$i,"stock");
$csv_weight=#mysql_result($csvdata_result,$i,"weight");
$csv_manufacturer=#mysql_result($csvdata_result,$i,"manufacturer");
$csv_misc1=#mysql_result($csvdata_result,$i,"misc1");
$csv_misc2=#mysql_result($csvdata_result,$i,"misc2");
$csv_selectlist=#mysql_result($csvdata_result,$i,"selectlist");
$csv_level5=#mysql_result($csvdata_result,$i,"level5");
$csv_frontpage=#mysql_result($csvdata_result,$i,"frontpage");
$csv_level3=#mysql_result($csvdata_result,$i,"level3");
$csv_minquantity=#mysql_result($csvdata_result,$i,"minquantity");
$csv_quantity1=#mysql_result($csvdata_result,$i,"quantity1");
$csv_discount1=#mysql_result($csvdata_result,$i,"discount1");
$csv_quantity2=#mysql_result($csvdata_result,$i,"quantity2");
$csv_discount2=#mysql_result($csvdata_result,$i,"discount2");
$csv_quantity3=#mysql_result($csvdata_result,$i,"quantity3");
$csv_discount3=#mysql_result($csvdata_result,$i,"discount3");
$count_check="SELECT COUNT(*) AS totalCount FROM products WHERE skucode = '$csv_code'";
$count_result=mysql_query($count_check);
mysql_query($count_check);
$totalCount=#mysql_result($count_result,0,'totalCount');
$loopCount = ceil($totalCount / 25);
for($j = 0; $j < $loopCount; $j++){
$prod_check="SELECT skucode FROM products WHERE skucode = '$csv_code' LIMIT ($loopCount*25), 25;";
$prodresult=mysql_query($prod_check);
mysql_query($prod_check);
$prodnum =#mysql_num_rows($prodresult);
$prod_id=#mysql_result($prodresult,0,"catalogid");
if($prodnum<1){
echo "NOT FOUND:$csv_code<br>";
$count_sku_not_found=$count_sku_not_found+1;
$list_sku_not_found=$list_sku_not_found." $csv_code";}
if($prodnum>1){
echo "DUPLICATE:$csv_ccode<br>";
$count_duplicate_skus=$count_duplicate_skus+1;
$list_duplicate_skus=$list_duplicate_skus." $csv_code";}
if ($prodnum==1){
///This prevents an overwrite from happening if the csv file doesn't produce properly
if ($csv_price!="" OR $csv_price!=NULL)
{$sql_price='price="'.$csv_price.'"';}
if ($csv_retail!="" OR $csv_retail!=NULL)
{$sql_retail=',retail="'.$csv_retail.'"';}
if ($csv_stock!="" OR $csv_stock!=NULL)
{$sql_stock=',stock="'.$csv_stock.'"';}
if ($csv_weight!="" OR $csv_weight!=NULL)
{$sql_weight=',weight="'.$csv_weight.'"';}
if ($csv_manufacturer!="" OR $csv_manufacturer!=NULL)
{$sql_manufacturer=',manufacturer="'.$csv_manufacturer.'"';}
if ($csv_misc1!="" OR $csv_misc1!=NULL)
{$sql_misc1=',misc1="'.$csv_misc1.'"';}
if ($csv_misc2!="" OR $csv_misc2!=NULL)
{$sql_pother2=',pother2="'.$csv_misc2.'"';}
if ($csv_selectlist!="" OR $csv_selectlist!=NULL)
{$sql_selectlist=',selectlist="'.$csv_selectlist.'"';}
if ($csv_level5!="" OR $csv_level5!=NULL)
{$sql_level5=',level5="'.$csv_level5.'"';}
if ($csv_frontpage!="" OR $csv_frontpage!=NULL)
{$sql_frontpage=',frontpage="'.$csv_frontpage.'"';}
$import="UPDATE products SET $sql_price $sql_retail $sql_stock $sql_weight $sql_manufacturer $sql_misc1 $sql_misc2 $sql_selectlist $sql_level5 $sql_frontpage $sql_in_stock WHERE skucode='$csv_code'";
mysql_query($import) or die(mysql_error("error updating in products table"));
echo "Update ".$csv_code." successful ($i)<br>";
$count_success_update_skus=$count_success_update_skus+1;
$list_success_update_skus=$list_success_update_skus." $csv_code";
//empty out variables
$sql_price='';
$sql_retail='';
$sql_stock='';
$sql_weight='';
$sql_manufacturer='';
$sql_misc1='';
$sql_misc2='';
$sql_selectlist='';
$sql_level5='';
$sql_frontpage='';
$sql_in_stock='';
$prodnum=0;
}
}
$i++;
}
Is it timing out before the first row is returned or is it between rows during the read? One good practice bit would be to handle your query in chunks; do a count first to see how many records you are dealing with for the SKU, the loop through smaller chunks (the size of these chunks would depend on how many things you have to do with each row). Your updated workflow would look more like this:
Get next SKU from CSV
Get a total count: SELECT COUNT(*) AS totalCount FROM products WHERE products.sku = csv.sku
Determine chunk size (using 25 for this demo)
loopCount = ceil(totalCount / 25)
Loop through all results using a loop like this: for($i = 0; $i < loopCount; $i++)
Inside your loop you should be running a query like this: SELECT * FROM products WHERE products.sku = csv.sku LIMIT (loopCount*25), 25
You will want to use a constant order for your SELECT chunks; your unique ID would probably be best.
I think you can solve this problem with cron. http://en.wikipedia.org/wiki/Cron . It has never had timeout.
I have an 800mb text file with 18,990,870 lines in it (each line is a record) that I need to pick out certain records, and if there is a match write them into a database.
It is taking an age to work through them, so I wondered if there was a way to do it any quicker?
My PHP is reading a line at a time as follows:
$fp2 = fopen('download/pricing20100714/application_price','r');
if (!$fp2) {echo 'ERROR: Unable to open file.'; exit;}
while (!feof($fp2)) {
$line = stream_get_line($fp2,128,$eoldelimiter); //use 2048 if very long lines
if ($line[0] === '#') continue; //Skip lines that start with #
$field = explode ($delimiter, $line);
list($export_date, $application_id, $retail_price, $currency_code, $storefront_id ) = explode($delimiter, $line);
if ($currency_code == 'USD' and $storefront_id == '143441'){
// does application_id exist?
$application_id = mysql_real_escape_string($application_id);
$query = "SELECT * FROM jos_mt_links WHERE link_id='$application_id';";
$res = mysql_query($query);
if (mysql_num_rows($res) > 0 ) {
echo $application_id . "application id has price of " . $retail_price . "with currency of " . $currency_code. "\n";
} // end if exists in SQL
} else
{
// no, application_id doesn't exist
} // end check for currency and storefront
} // end while statement
fclose($fp2);
At a guess, the performance issue is because it issues a query for each application_id with USD and your storefront.
If space and IO aren't an issue, you might just blindly write all 19M records into a new staging DB table, add indices and then do the matching with a filter?
Don't try to invent the wheel, it's been done. Use a database to search through the file's content. You can looad that file into a staging table in your database and query your data using indexes for fast access if they add value. Most if not all databases have import/loading tools to get a file into the database relatively fast.
19M rows on DB will slow it down if DB was not designed properly. You can still use text files, if it is partitioned properly. Recreating multiple smaller files, based on certain parameters, storing in proper sorted way might work.
Anyway PHP is not the best language for file IO and processing, it is much slower than Java for this task, while plain old C would be one of the fastest for the job. PHP should be restricted to generated dynamic Web output, while core processing should be in Java/C. Ideally it should be Java/C service which generates output, and PHP using that feed to generate HTML output.
You are parsing the input line twice by doing two explodes in a row. I would start by removing the first line:
$field = explode ($delimiter, $line);
list($export_date, ...., $storefront_id ) = explode($delimiter, $line);
Also, if you are only using the query to test for a match based on your condition, don't use SELECT * use something like this:
"SELECT 1 FROM jos_mt_links WHERE link_id='$application_id';"
You could also, as Brandon Horsley suggested, buffer a set of application_id values in an array and modify your select statement to use the IN clause thereby reducing the number of queries you are performing.
Have you tried profiling the code to see where it's spending most of its time? That should always be your first step when trying to diagnose performance problems.
Preprocess with sed and/or awk ?
Databases are built and designed to cope with large amounts of data, PHP isn't. You need to re-evaluate how you are storing the data.
I would dump all the records into a database, then delete the records you don't need. Once you have done that, you can copy those records wherever you want.
As others have mentioned, the expense is likely in your database query. It might be faster to load a batch of records from the file (instead of one at a time) and perform one query to check multiple records.
For example, load 1000 records that match the USD currency and storefront at a time into an array and execute a query like:
'select link_id from jos_mt_links where link_id in (' . implode(',', application_id_array) . ')'
This will return a list of those records that are in the database. Alternatively, you could change the sql to be not in to get a list of those records that are not in the database.
I have large database table, approximately 5GB, now I wan to getCurrentSnapshot of Database using "Select * from MyTableName", am using PDO in PHP to interact with Database. So preparing a query and then executing it
// Execute the prepared query
$result->execute();
$resultCollection = $result->fetchAll(PDO::FETCH_ASSOC);
is not an efficient way as lots of memory is being user for storing into the associative array data which is approximately, 5GB.
My final goal is to collect data returned by Select query into an CSV file and put CSV file at an FTP Location from where Client can get it.
Other Option I thought was to do:
SELECT * INTO OUTFILE "c:/mydata.csv"
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY "\n"
FROM my_table;
But I am not sure if this would work as I have cron that initiates the complete process and we do not have an csv file, so basically for this approach,
PHP Scripts will have to create an CSV file.
Do a Select query on the database.
Store the select query result into the CSV file.
What would be the best or efficient way to do this kind of task ?
Any Suggestions !!!
You can use the php function fputcsv (see the PHP Manual) to write single lines of csv into a file. In order not to run into the memory problem, instead of fetching the whole result set at once, just select it and then iterate over the result:
$fp = fopen('file.csv', 'w');
$result->execute();
while ($row = $result->fetch(PDO::FETCH_ASSOC)) {
// and here you can simply export every row to a file:
fputcsv($fp, $row);
}
fclose($fp);