what's the optimal chunk size for laravel

what's the optimal chunk size for laravel - php

I've been working on a web app that reads some data from a remote server on it's own server side. I'm using Laravel, and initially thought it would be easier to develop my own php file with methods to connect to the remote DB. What this php does: fetch the data from the remote server (postgreSQL) and insert that into Laravel using Eloquent. Here's some code snippets.
try {
$dal = connect();
//... some validations not relevant to the question
$result = pg_query($query) or die('Query failed: ' . pg_last_error());
$data = (array_values(pg_fetch_all($result)));
$chunkOfData = array_chunk($data, 1000);
foreach ($chunkOfData as $chunk) {
insertChunkToDB($chunk);
}
closeDB($dal);
} catch(exception $e){
Log::error('Error syncing both databases, more details: '.$e);
exit(1);
}
My question focuses on the array_chunk.
I had to do this because the php crashed with "out of memory error". I used the insertChunk function so the garbage collector would clean the data that's already been inserted. Notice this code is fully functioning (as far as I know).
But... if the pg_fetch_all already retrieved the data... isn't it already in memory? why didn't the program crash then? As a side question how fast can Laravel input it's data? Would using smaller chunks (like 100) cause the program to slow down due to jumping between iterations/garbage collecting? What would be the splitting number to make things fastest?
Oh, by the way this is the function
function insertChunkToDB($chunk){
foreach ($chunk as $element) {
$object = json_decode(json_encode($element), FALSE);
insertObjectToDB($object);
}
}
The encode/decode is done so I can do this
function insertObjectToDB($element){
$LaravelModel = $element->id;
$LaravelModel = $element->name; //and so on...
$LaravelModel->save();
}
When recording foreign keys I do a quick check to see if I have the corresponding value, and if not I quickly issue an extra query to the remote server to record that data in the corresponding table.

Related

How to execute a code that doesn't stop the rest if broken with Laravel

I'm building a full system with both Laravel and VueJS and at some point, I'm generating pictures from a list of pictures with Image Intervention.
But this process can break, there are many issues that I faced and resolved that can appear in the future.
What would you recommend me to do to have a broken code not stop the rest ? I was thinking on some service that would be called and be independent, asynchronous.
Can Laravel cover that ? I have read about events in both Laravel and Symfony but that is something I never understood.
Greetgins

Well, I was in a similar problem some days ago. Although, My problem was related to inserting data from CSV to the database. So, there were chances of having some different datatype that might generate the error and halt the whole remaining process. So, I used try catch inside my job. I will show you reference, you can modify as you wish:
$error_arr = array();
$error_row_numbers = array();
try{
//Write your code here that might throw error
$row = Model::updateOrCreate(
['id' => $id,
$Arr
);
} catch (\Exception $e) {
//Optionally you can store error message
//and image number which is failed here
$error_arr[] = $e->getMessage();
$error_row_numbers[] = $row_no; //this row_no is different variable
//and should be increased in loop
//to determine exact image
}

PHP DB caching, without including files

I've been searching for a suitable PHP caching method for MSSQL results.
Most of the examples I can find suggest storing the results in an array, which would then get included to page. This seems great unless a request for the content was made at the same time as it being updated/rebuilt.
I was hoping to find something similar to ASP's application level variables, but far as I'm aware, PHP doesn't offer this functionality?
The problem I'm facing is I need to perform 6 queries on page to populate dropdown boxes. This happens on the vast majority of pages. It's also not an option to combine the queries. The cached data will also need to be rebuilt sporadically, when the system changes. This could be once a day, once a week or a month. Any advice will be greatly received, thanks!

You can use Redis server and phpredis PHP extension to cache results fetched from database:
$redis = new Redis();
$redis->connect('/tmp/redis.sock');
$sql = "SELECT something FROM sometable WHERE condition";
$sql_hash = md5($sql);
$redis_key = "dbcache:${sql_hash}";
$ttl = 3600; // values expire in 1 hour
if ($result = $redis->get($redis_key)) {
$result = json_decode($result, true);
} else {
$result = Db::fetchArray($sql);
$redis->setex($redis_key, $ttl, json_encode($result));
}
(Error checks skipped for clarity)

How to dump MySQL table to a file then read it and use it in place of the DB itself?

because a provider I use, has a quite unreliable MySQL servers, which are down at leas 1 time pr week :-/ impacting one of the sites I made, I want to prevent its outeges in the following way:
dump the MySQL table to a file In case the connection with the SQL
server is failed,
then read the file instead of the Server, till the Server is back.
This will avoid outages from the user experience point of view.
In fact things are not so easy like it seems and I ask for your help please.
What I did is to save the data to a JSON file format.
But this got issues because many data on the DB are "in clear" included escaped complex URLs, with long argument's line, that give some issue during the decode process from JSON.
On CSV and TSV is also not workign correctly.
CSV is delimited by Commas or Semilcolon , and those are present in the original content taken from the DB.
TSV format leave double quotes that are not deletable, without avoid to go to eliminate them into the record's fields
Then I tried to serialize each record read from the DB, store it and retrive it serializing it.
But the result is a bit catastrophic, becase all the records are stored in the file.
When I retrieve them, only one is returned. then there is something that blocks the functioning of the program (here below the code please)
require_once('variables.php');
require_once("database.php");
$file = "database.dmp";
$myfile = fopen($file, "w") or die("Unable to open file!");
$sql = mysql_query("SELECT * FROM song ORDER BY ID ASC");
// output data of each row
while ($row = mysql_fetch_assoc($sql)) {
// store the record into the file
fwrite($myfile, serialize($row));
}
fclose($myfile);
mysql_close();
// Retrieving section
$myfile = fopen($file, "r") or die("Unable to open file!");
// Till the file is not ended, continue to check it
while ( !feof($myfile) ) {
$record = fgets($myfile); // get the record
$row = unserialize($record); // unserialize it
print_r($row); // show if the variable has something on it
}
fclose($myfile);
I tried also to uuencode and also with base64_encode but they were worse choices.
Is there any way to achieve my goal?
Thank you very much in advance for your help

If you have your data layer well decoupled you can consider using SQLite as a fallback storage.
It's just a matter of adding one abstraction more, with the same code accessing the storage and changing the storage target in case of unavailability of the primary one.
-----EDIT-----
You could also try to think about some caching (json/html file?!) strategy returning stale data in case of mysql outage.
-----EDIT 2-----
If it's not too much effort, please consider playing with PDO, I'm quite sure you'll never look back and believe me this will help you structuring your db calls with little pain when switching between storages.
Please take the following only as an example, there are much better
way to design this architectural part of code.
Just a small and basic code to demonstrate you what I mean:
class StoragePersister
{
private $driver = 'mysql';
public function setDriver($driver)
{
$this->driver = $driver;
}
public function persist($data)
{
switch ($this->driver)
{
case 'mysql':
$this->persistToMysql($data);
case 'sqlite':
$this->persistToSqlite($data);
}
}
public function persistToMysql($data)
{
//query to mysql
}
public function persistSqlite($data)
{
//query to Sqlite
}
}
$storage = new StoragePersister;
$storage->setDriver('sqlite'); //eventually to switch to sqlite
$storage->persist($somedata); // this will use the strategy to call the function based on the storage driver you've selected.
-----EDIT 3-----
please give a look at the "strategy" design pattern section, I guess it can help to better understand what I mean.

After SELECT... you need to create a correct structure for inserting data, then you can serialize or what you want.
For example:
You have a row, you could do that - $sqls[] = "INSERT INTOsong(field1,field2,.. fieldN) VALUES(field1_value, field2_value, ... fieldN_value);";
Than you could serialize this $sqls, write into file, and when you need it, you could read, unserialize and make query.

Have you thought about caching your queries into a cache like APC ? Also, you may want to use mysqli or pdo instead of mysql (Mysql is deprecated in the latest versions of PHP).
To answer your question, this is one way of doing it.
var_export will export the variable as valid PHP code
require will put the content of the array into the $data variable (because of the return statement)
Here is the code :
$sql = mysql_query("SELECT * FROM song ORDER BY ID ASC");
$content = array();
// output data of each row
while ($row = mysql_fetch_assoc($sql)) {
// store the record into the file
$content[$row['ID']] = $row;
}
mysql_close();
$data = '<?php return ' . var_export($content, true) . ';';
file_put_contents($file, $data);
// Retrieving section
$rows = require $file;

Ajax call to php for csv file manipulation hangs

Okay so I have a button. When pressed it does this:
Javascript
$("#csv_dedupe").live("click", function(e) {
file_name = 'C:\\server\\xampp\\htdocs\\Gene\\IMEXporter\\include\\files\\' + $("#IMEXp_import_var-uploadFile-file").val();
$.post($_CFG_PROCESSORFILE, {"task": "csv_dupe", "file_name": file_name}, function(data) {
alert(data);
}, "json")
});
This ajax call gets sent out to this:
PHP
class ColumnCompare {
function __construct($column) {
$this->column = $column;
}
function compare($a, $b) {
if ($a[$this->column] == $b[$this->column]) {
return 0;
}
return ($a[$this->column] < $b[$this->column]) ? -1 : 1;
}
}
if ($task == "csv_dupe") {
$file_name = $_REQUEST["file_name"];
// Hard-coded input
$array_var = array();
$sort_by_col = 9999;
//Open csv file and dump contents
if(($handler = fopen($file_name, "r")) !== FALSE) {
while(($csv_handler = fgetcsv($handler, 0, ",")) !== FALSE) {
$array_var[] = $csv_handler;
}
}
fclose($handler);
//copy original csv data array to be compared later
$array_var2 = $array_var;
//Find email column
$new = array();
$new = $array_var[0];
$findme = 'email';
$counter = 0;
foreach($new as $key) {
$pos = strpos($key, $findme);
if($pos === false) {
$counter++;
}
else {
$sort_by_col = $counter;
}
}
if($sort_by_col === 999) {
echo 'COULD NOT FIND EMAIL COLUMN';
return;
}
//Temporarily remove headers from array
$headers = array_shift($array_var);
// Create object for sorting by a particular column
$obj = new ColumnCompare($sort_by_col);
usort($array_var, array($obj, 'compare'));
// Remove Duplicates from a coulmn
array_unshift($array_var, $headers);
$newArr = array();
foreach ($array_var as $val) {
$newArr[$val[$sort_by_col]] = $val;
}
$array_var = array_values($newArr);
//Write CSV to standard output
$sout = fopen($file_name, 'w');
foreach ($array_var as $fields) {
fputcsv($sout, $fields);
}
fclose($sout);
//How many dupes were there?
$number = count($array_var2) - count($array_var);
echo json_encode($number);
}
This php gets all the data from a csv file. Columns and rows and using the fgetcsv function it assigns all the data to an array. Now I have code in there that also dedupes (finds and removes a copy of a duplicate) the csv files by a single column. Keeping intact the row and column structure of the entire array.
The only problem is, even though it works with small files that have 10 or so rows that i tested, it does not work for files with 25,000.
Now before you say it, I have went into my php.ini file and changed the max_input, filesize, max time running etc etc to astronomical values to insure php can accept file sizes of upwards to 999999999999999MB and time to run its script of a few hundred years.
I used a file with 25,000 records and execute the script. Its been two hours and fiddler still shows that a http request has not yet been sent back. Can someone please give me some ways that I can optimize my server and my code?
I was able to use that code from a user who helped my in another question I posted on how to even do this initially. My concern now is even though I tested it to work, I want to know how to make it work in less than a minute. Excel can dedupe a column of a million records in a few seconds why cant php do this?

Sophie, I assume that you are not experienced at writing this type of application because IMO this isn't the way to approach this. So I'll pitch this accordingly.
When you have a performance problem like this, you really need to binary chop the problem to understand what is going on. So step 1 is to decouple the PHP timing problem from AJAX and get a simple understanding of why your approach is so unresponsive. Do this using a locally installed PHP-cgi or even use your web install and issue a header('Context-Type: text/plain' ) and dump out microtiming of each step. How long does the CSV read take, ditto the sort, then nodup, then the write? Do this for a range of CSV file sizes going up by 10x in rowcount each time.
Also do a memory_get_usage() at each step to see how you are chomping up memory. Because your approach is a real hog and you are probably erroring out by hitting the configured memory limits -- a phpinfo() will tell you these.
The read, nodup and write are all o(N), but the sort is o(NlogN) at best and o(N2) at worst. Your sort is also calling a PHP method per comparison so will be slow.
What I don't understand is why you are even doing the sort, since your nodup algo does not make use of the fact that the rows are sorted.
(BTW, the sort will also sort the header row inline, so you need to unshift it before you do the sort if you still want to do it.)
There are other issue that you need to think about such as
Using a raw parameter as a filename makes you vulnerable to attack. Better to fix the patch relative to, say DOCROOT/Gene/IMEXporter/include and enforce some grammar on the file names.
You need to think about atomicity of reading and rewriting large files as a response to a web request -- what happen if two clients make the request at the same time.
Lastly you compare this to Excel, well load and saving Excel files can take time, and Excel doesn't have to scale to respond to 10s or 100s or users at the same time. In a transactional system you typically use a D/B backend for this sort of thing, and if you are using a web interface to compute heavy tasks, you need to accept the Apache (or equiv server) hard memory and timing constraints and chop your algos and approach accordingly.

Duplicate keys in memcached

I'm having some trouble with memcache on my php site. Occasionally I'll get a report that the site is misbehaving and when I look at memcache I find that a few keys exist on both servers in the cluster. The data is not the same between the two entries (one is older).
My understanding of memcached was that this shouldn't happen...the client should hash the key and then always pick the same server. So either my understanding is wrong or my code is. Can anyone explain why this might be happening?
FWIW the servers are hosted on Amazon EC2.
All my connections to memcache are opened through this function:
$mem_servers = array(
array('ec2-000-000-000-20.compute-1.amazonaws.com', 11211, 50),
array('ec2-000-000-000-21.compute-1.amazonaws.com', 11211, 50)
);
function ConnectMemcache()
{
global $mem_servers;
if ($memcon == 0) {
$memcon = new Memcache();
foreach($mem_servers as $server) $memcon->addServer($server[0], $server[1], true);
}
return($memcon);
}
and values are stored through this:
function SetData($key,$data)
{
global $mem_global_key;
if(MEMCACHE_ON_OFF)
{
$key = $mem_global_key.$key;
$memcache = ConnectMemcache();
$memcache->set($key, $data);
return true;
}
else
{
return false;
}
}

I think this blog post touches on the problems your having.
http://www.caiapps.com/duplicate-key-problem-in-memcache-php/
From the article it sounds like the following happens:
- a memcache server that has the key originally drops out
- the key is recreated on the 2nd server with updated data
- 1st server come back online and into the cluster with the old data.
- Now you have the keys save on 2 servers with different data
Sounds like you may need to use Memcache::flush to clear out the memcache cluster before your write to help minimize how long duplicates might exist in your cluster.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.