Select, edit and then insert data into new database - php

What is the best way to SELECT all from MYSQL DATABASE_1 almost every item edit (split, merge, validate email format, validate URL format, etc.) - by using php and then INSERT INTO new DATABASE 2.
The old table has almost 3 million rows.
If I do foreach for every row (which is not good idea :) ) it takes more than 1 hour for 2 thousand rows.
If I insert multiple rows in one query:
INSERT INTO tbl (col1, col2, ...) VALUES (item1, item2, ...), (item3, item4, ...)
it is almost the same
Any ideas how to do it professionally and fast?

You can do it paging the query and using several php processes. For example, a process one retrieve and edit 100000 elements the insert into new DB.
To implement it you can create a php parent process that distributes the workload into child processes, something like this:
$total_rows = //Get from your mysql table
$offsets = []
$limit = 10000;
//Create offsets to process in bulk
for ($i=0 ; $i<$total_rows ; $i++) {
$offsets[] = ($i * limit);
}
//Process rows
while ($processed_rows < $total_rows)
$pids = [];
//Create 10 childs processes
for ($i=0;$i<10;$i++) {
if ($pid) {
$pids[] = $pid;
} else {
//Get next offset to process
$offset = array_shift ($offsets);
process_and_insert_row($offset, $limit); //this function gets, edits and inserts items
}
}
//Wait for childs to finish
foreach ($pids as $pid) {
pcntl_waitpid($pid, $status);
$processed_rows += $limit;
unset($pids[$pid]);
}
}
Warning: The code is not tested, but I think it demonstrates the idea behind the explanation...

Related

Recursive call for cron job

I have a three PHP script, from which two are running as a separate cron job each day in server.
First script(get_products.php), make curl call to external server to get all products data and stored that data in database. Note that there are around 10000 products and increasing each day.
In Second script(find_related.php), selects products from database stored by first script, perform some operations and store operational data in another database. Each product have 10 rows so in this database there is around 100000 rows. This script is running as cron. Sometimes the script is not executed fully and that's why, the actual and expecting results are not stored in database. I included this line of code in script: ini_set('max_execution_time', '3600');
But it not works.
Here is the process done in this script:
Normally task is to find 10 related products based on tags. I have around 10300 products stored in my DB. Each time query take one product and their tags and try to randomly find one product tagged with same tag as main product and store the related product data into another DB for third script. Only one product per tag is allowed. If it will not find total of 10 related products then randomly gets products from another DB named bestseller_products.
Here is my code:
$get_all_products = mysql_query('SELECT * FROM store_products');
while($get_products_sql_res = mysql_fetch_array($get_all_products)){
$related_products = array();
$tags = explode(",",$get_products_sql_res['product_tags']);
$product_id = $get_products_sql_res['product_id'];
$product_handle = $get_products_sql_res['product_handle'];
$get_products_sql = mysql_query('SELECT * FROM related_products WHERE product_handle="'.$product_handle.'"');
if (mysql_num_rows($get_products_sql)==0)
{
$count = 0;
foreach($tags as $t){
$get_related_products_sql = mysql_query("SELECT product_handle, product_title, product_image FROM store_products WHERE product_tags like '%".$t."%' AND product_id != '".$product_id."' ORDER BY RAND()");
if(!$get_related_products_sql){
continue;
}
while($get_related_products = mysql_fetch_array($get_related_products_sql) ){
$related_product_title = mysql_real_escape_string($get_related_products['product_title']);
$found = false;
foreach($related_products as $r){
if($r['handle'] == $get_related_products['product_handle']){
$found = true;
break;
}
}
if($found == false){
$related_products[$count]['handle'] = $get_related_products['product_handle'];
mysql_query("INSERT INTO related_products (product_handle, product_id, related_product_title, related_product_image, related_product_handle) VALUES ('$product_handle','$product_id','$related_product_title', '$get_related_products[2]', '$get_related_products[0]')");
$count = $count + 1;
break;
}
}
}
if($count < 10){
$bestseller_products = mysql_query("SELECT product_handle, product_title, product_image FROM bestseller_products WHERE product_id != '".$product_id."' ORDER BY RAND() LIMIT 10");
while($bestseller_products_sql_res = mysql_fetch_array($bestseller_products)){
if($count < 10){
$found = false;
$related_product_title = mysql_real_escape_string($bestseller_products_sql_res['product_title']);
$related_product_handle = $bestseller_products_sql_res['product_handle'];
foreach($related_products as $r){
if($r['handle'] == $related_product_handle){
$found = true;
break;
}
}
if($found == false){
$related_product_image = $bestseller_products_sql_res['product_image'];
mysql_query("INSERT INTO related_products (product_handle, product_id, related_product_title, related_product_image, related_product_handle) VALUES ('$product_handle','$product_id','$related_product_title', '$related_product_image', '$related_product_handle')");
$count = $count + 1;
}
}
}
}
}
}
Third script(create_metafields.php), created metafields in external server using data created by second script. And same problem arises as in second script.
So i want to execute the second script into parts. I mean, not to process all 10000 products in one call but want to run unto parts(1-500,501-1000,1001-1500,..) like it. But dont want to create separate cron jobs. Please suggest if someone has solution. I really need to figure it out.
Thanks in advance!

MySQL + PHP millions of rows, sorting the data from multiple queries. Memory Consumption

So I have 2 tables. One table has all the actual campaign_id information.
The second table has the impression/statistic information on the campaign_id's
I have a table on the page (i use ajax, but that's besides the point). I want to "sort" a column, but all the rows are generated by the campaign_id table, I run all of the statistics for every campaign first, and then link them up to each row. then after all the info/data is up, it then sorts all of it. this uses a MASSIVE amount of memory and resources. Is this efficient at all? is there a better solution to sorting huge amounts of data?
// I have to increase memory because the sorting takes a lot of resource
ini_set('memory_limit','1028M');
// the column I want to sort
$sortcolumn = $this->input->post('sortcolumn');
// direction of the sort ASC/DESC
$sortby = $this->input->post('sortby');
// millions of impression data that is linked with campaign_id
$cdata= array();
$s = "SELECT report_campaign_id,";
$s .= "SUM(report_imps) AS imps ";
$s .= "FROM Campaign_Impressions";
$s .= "GROUP BY report_campaign_id ";
$r = $this->Qcache->result($s,0,'campaignsql');
foreach($r as $c) {
$cdata[$c->report_campaign_id]['imps'] = ($c->imps) ? $c->imps : 0;
}
// 500,000+ thousand campaigns
// I draw my table from these campaigns
$rows = array();
$s = "SELECT * FROM Campaigns ";
$r = $this->db->query($s)->result();
foreach($r as $c)
{
$row= array();
$row['campaign_id'] = $c->campaign_id;
// other campaign info here...
// campaign statistics here...
$row['campaign_imps'] = $cdata[$c->campaign_id]['imps'];
// table row
$rows[] = $row;
}
// prepare the columns i want to sort
$sortc = array();
foreach($rows as $sortarray) {
if (!isset($sortarray[ $sortcolumn ])) continue;
$sortc[] = str_replace(array('$',','),'',$sortarray[ $sortcolumn ]);
}
// sort columns and direction
array_multisort($sortc,(($sortby==='asc')?SORT_ASC:SORT_DESC),SORT_NATURAL,$rows);
As you can see, the "campaign_impressions" table is running data on "every" campaign, and doesn't seem so efficient, but more effective instead of running a query per row to know the data.
(I dont display all the campaigns, but I need to run every one of them to know the sorting of all)
You should let MySQL do the job my using order by
if this still takes a lot of time on the MySQL side consider using sorted indexes on the columns

Using Inner Join and mysql_num_rows() is Always returning 1 less row

I checked throught the existing topics. I have a fix for my problem but I know its not the right fix and I'm more interested making this work right, than creating a workaround it.
I have a project where I have 3 tables, diagnosis, visits, and treatments. People come in for a visit, they get a treatment, and the treatment is for a diagnosis.
For displaying this information on the page, I want to show the patient's diagnosis, then show the time they came in for a visit, that visit info can then be clicked on to show treatment info.
To do this a made this function in php:
<?
function returnTandV($dxid){
include("db.info.php");
$query = sprintf("SELECT treatments.*,visits.* FROM treatments LEFT JOIN visits ON
treatments.tid = visits.tid WHERE treatments.dxid = '%s' ORDER BY visits.dos DESC",
mysql_real_escape_string($dxid));
$result = mysql_query($query) or die("Failed because: ".mysql_error());
$num = mysql_num_rows($result);
for($i = 0; $i <= $num; ++$i) {
$v[$i] = mysql_fetch_array($result MYSQL_ASSOC);
++$i;
}
return $v;
}
?>
The function works and will display what I want which is all of the rows from both treatments and visits as 1 large assoc. array the problem is it always returns 1 less row than is actually in the database and I'm not sure why. There are 3 rows total, but msql_num_rows() will only show it as 2. My work around has been to just add 1 ($num = mysql_num_rows($result)+1;) but I would rather just have it be correct.
This section looks suspicious to me:
for($i = 0; $i <= $num; ++$i) {
$v[$i] = mysql_fetch_array($result MYSQL_ASSOC);
++$i;
}
You're incrementing i twice
You're going to $i <= $num when you most likely want $i < $num
This combination may be why you're getting unexpected results. Basically, you have three rows, but you're only asking for rows 0 and 2 (skipping row 1).
Programmers always count from 0. So, you are starting your loop at 0. If you end at 2, you have reached 3 rows.
Row0, Row1, Row2.
if $i = 0, and u increment it BEFORE adding something to the array, u skip the first row. increment $i AFTER the loop runs to start at 0 (first key).
For loops are not good for this: rather do:
$query=mysql_query(' --mysql --- ');
while ($row=mysql_fetch_array($query)){
$v[]=$row["dbcolumn"];
}
return $v for your function then.compact and neat. you can create an associative array, as long as the key name is unique (like primary ids).. $v["$priid"]=$row[1];

Count result according level

I have Adjacency list mode structure like that and i want to count all title of parent according level like Food = (2,4,3), Fruit = (3,3)
tree tabel structure
after that make tree like that
by this code i m getting right total like for Food =9, Fruit = 6
function display_children($parent, $level)
{
$result = mysql_query('SELECT title FROM tree '.'WHERE parent="'.$parent.'"');
$count = 0;
while ($row = mysql_fetch_array($result))
{
$data= str_repeat(' ',$level).$row['title']."\n";
echo $data;
$count += 1 + $this->display_children($row['title'], $level+1);
}
return $count;
}
call function
display_children(Food, 0)
Result : 9 // but i want to get result like 2,4,3
But i want to get count total result like that For Food 2,4,3 and For Fruit 3,3 according level
so plz guide how to get total according level
function display_children($parent, $level)
{
$result = mysql_query('SELECT title FROM tree '.'WHERE parent="'.$parent.'"');
$count = "";
while ($row = mysql_fetch_array($result))
{
$data= str_repeat(' ',$level).$row['title']."\n";
echo $data;
if($count!="")
$count .= (1 + $this->display_children($row['title'], $level+1));
else
$count = ", ".(1 + $this->display_children($row['title'], $level+1));
}
return $count;
}
Lets try this once..
If you want to get amounts by level, then make the function return them by level.
function display_children($parent, $level)
{
$result = mysql_query('SELECT title FROM tree WHERE parent="'.$parent.'"');
$count = array(0=>0);
while ($row = mysql_fetch_array($result))
{
$data= str_repeat(' ',$level).$row['title']."\n";
echo $data;
$count[0]++;
$children= $this->display_children($row['title'], $level+1);
$index=1;
foreach ($children as $child)
{
if ($child==0)
continue;
if (isset($count[$index]))
$count[$index] += $child;
else
$count[$index] = $child;
$index++;
}
}
return $count;
}
Note that its hard for me to debug the code as i dont have your table. If there is any error let me know and i will fix it.
Anyways result will be array
which should contain amounts of levels specified by indices:
$result=display_children("Food", 0) ;
var_export($result);//For exact info on all levels
echo $result[0];//First level, will output 2
echo $result[1];//Second level, will output 4
echo $result[2];//Third level, will output 3
And by the way there is typo in your database, id 10 (Beef) should have parent "Meat" instead of "Beat" i guess.
If you want to see testing page, its here.
This article has all you need to creates a tree with mysql, and how count item by level
If you don't mind changing your schema I have an alternative solution which is much simpler.
You have your date in a table like this...
item id
-------------+------
Food | 1
Fruit | 1.1
Meat | 1.2
Red Fruit | 1.1.1
Green Fruit | 1.1.2
Yellow Fruit | 1.1.3
Pork | 1.2.1
Queries are now much simpler, because they're just simple string manipulations. This works fine on smallish lists, of a few hundred to a few thousand entries - it may not scale brilliantly - I've not tried that.
But to count how many things there are at the 2nd level you can just do a regexp search.
select count(*) from items
where id regexp '^[0-9]+.[0-9]+$'
Third level is just
select count(*) from items
where id regexp '^[0-9]+.[0-9]+.[0-9]+$'
If you just want one sub-branch at level 2
select count(*) from items
where id regexp '^[0-9]+.[0-9]+$'
and id like "1.%"
It has the advantage that you don't need to run as many queries on the database, and as a bonus it's much easier to read the data in the tables and see what's going on.
I have a nagging feeling this might not be considered "good form", but it does work very effectively. I'd be very interested in any critiques of this method, do DB people think this is a good solution? If the table were very large, doing table scans and regexps all the time would get very inefficient - your approach would make better use of the any indexes, which is why I say this probably doesn't scale very well, but given you don't need to run so many queries, it may be a trade off worth taking.
An solution by a php class :
<?php
class LevelDepCount{
private $level_count=array();
/**
* Display all child of an element
* #return int Count of element
*/
public function display_children($parent, $level, $isStarted=true)
{
if($isStarted)
$this->level_count=array(); // Reset for new ask
$result = mysql_query('SELECT title FROM tree '.'WHERE parent="'.$parent.'"');
$count = 0; // For the level in the section
while ($row = mysql_fetch_array($result))
{
$data= str_repeat(' ',$level).$row['title']."\n";
echo $data;
$count += 1 + $this->display_children($row['title'], $level+1,false);
}
if(array_key_exists($level, $this->level_count))
$this->level_count[$level]+=$count;
else
$this->level_count[$level]=$count;
return $count;
}
/** Return the count by level.*/
public function getCountByLevel(){
return $this->level_count;
}
}
$counter=new LevelDepCount();
$counter->display_children("Food",0);
var_dump($counter->getCountByLevel());
?>
If you modify your query you can get all the data in one swoop and without that much calculations (code untested):
/* Get all the data in one swoop and arrange it for easy mangling later */
function populate_data() {
$result = mysql_query('SELECT parent, COUNT(*) AS amount, GROUP_CONCAT(title) AS children FROM tree GROUP BY parent');
$data = array();
while ($row = mysql_fetch_assoc($result)) {
/* Each node has the amount of children and their names */
$data[$row['parent']] = array($row['children'], int($row['amount']));
}
return $data;
}
/* The function that does the whole work */
function get_children_per_level($data, $root) {
$current_children = array($root);
$next_children = array();
$ret = array();
while(!empty($current_children) && !empty($next_children)) {
$count = 0;
foreach ($current_children as $node) {
$count += $data[$node][0]; /* add the amount */
$next_children = array_merge($next_children, explode($data[$node][1])); /* and its children to the queue */
}
ret[] = $count;
$current_children = $next_children;
$next_children = array();
}
return $ret;
}
$data = populate_data();
get_children_per_level($data, 'Food');
It shouldn't be difficult to modify the function to make a call per invocation or one call per level to populate the data structure without bringing the whole table into memory. I'd suggest against that if you have deep trees with just a few children as it is a lot more efficient to get all the data in one swoop and calculate it. If you have shallow trees with a lot of children, then it may be worth changing.
It would also be possible to put everything together in a single function, but I'd avoid re-calculating data for repeated calls when they are not needed. A possible solution for this would be to make this a class, use the populate_data function as the constructor that stores it as an internal private property and a single method that is the same as get_children_per_level without the first parameter as it would get the data off its internal private property.
In any case, I'd also suggest you use the ID column as a "parent" reference instead of other columns. To start with, my code will break if any of the names contains a comma :P. Besides, you may have two different elements with the same name. For example, you could have Vegetables -> Red -> Pepper and the Red will get slumped together with the Fruit's Red.
Another thing to note is that my code will enter an infinite loop if your DB data is not a tree. If there is any cycle in the graph, it will never finish. That bug could be easily solved by keeping a $visited array with all the nodes that have already been visited and not pushing them into the $next_children array within the loop (probably using array_diff($data[$node][1], $visited).

Repeating rows in an array and them paginating them

Lets say I have a table with 20 rows and a pagination script.
The pagination script is set to display 10 rows per page and of course it will display two pages.
The problem is that sometimes my table will have less than 20 rows, lets say 3 - so the script will display just one page with 3 entries.
I need a way to reppeat those 3 rows untill the number reachers 20, store them in an array, and than use the pagination script as normal.
Any ideeas, can it be done? Can someone put this in a code?
For those who wonder why?:) This is a problem becouse with each row i have asigned a post from my blog, and I have 20 posts which i want displayed. If for example the table which aoutoupdates with cron jobs has 17 rows, i'll have just 17 posts asociated with them. This is why i need to repreat them until 20, so i'll have all my 20 posts displayed no matter how many rows i have in the table:)
$query = "SELECT * FROM `db_table`";//PUT HERE A PROPER QUERY.
$result = mysql_result($query);
// mysql_num_rows($result) >10 WE ARE CHECKING HOW MANY LINES DO WE HAVE.
if(mysql_num_rows($result) >10){
/*...YOUR EXISTING CODES HERE...*/
}
else{
while($rows = mysql_fetch_assoc($result)){
$arrayOfRows[] = $rows; // HERE WE PULL ROWS FROM DB AND PUT IN AN ARRAY.
}
}
// NOW YOUR DB ROWS ARE IN THE ARRAY NAMED $arrayOfRows IF YOUR DB TABLE HAS LESS THEN 10 ROWS
$countRowsOfArray = count($arrayOfRows);
$rows = 20;
$dbRow=0;
for($n=0;$n<$rows;$n++){
if($dbRow > $countRowsOfArray) $dbRow = 0;
$newArrayOfRows[$n] = $arrayOfRows[$dbRow];
$dbRow++;
}
//NOW YOU HAVE $newArrayOfRows WHICH YOUR ROWS REPEATED UNTIL 20 LINES.
print_r($newArrayOfRows); //SEE IF THEY ARE THERE.
A simple solution would be (note: this code is optimized for readability, not speed):
$result = array();
for($i = 0; $i < 20; $i++){
$result = $array[$i % sizeof($array)];
}
This will fill the $result array with the contents of $array, repeating if necessary. You can also replace the loop condition by:
$i < 20 || $i < sizeof($array)
This will copy the whole array and, if necessary (the array has less than 20 entries), add copies.

Categories