memory leak while processing large CSV

memory leak while processing large CSV - php

I have a script that downloads a large product CSV file, processes the information therein (downloading images and resizing and preparing other data for database insertion), then creates another txt file of all the processed items. The problem is that it seems to be hemmoraging memory somewhere. I get an error 500 returned, but the log shows too much memory usage. I've unset as much as I can, and I'm using SPL iterators which are supposed to be less memory intensive, but i still can get to script to complete execution and enter all of the information. Can anyone point out something in the script that would help prevent the memory leakage?
<?php
define('IN_PHPBB', true);
define('IN_SHOP', true);
$phpbb_root_path = './../forum/';
$root_path = './../';
$phpEx = substr(strrchr(__FILE__, '.'), 1);
include($phpbb_root_path.'common.'.$phpEx);
// Start session management
$user->session_begin();
$auth->acl($user->data);
$user->setup();
set_time_limit(172800);
define('THUMBNAIL_IMAGE_MAX_WIDTH', 150);
define('THUMBNAIL_IMAGE_MAX_HEIGHT', 150);
function generate_thumb($source_image_path, $thumbnail_image_path)
{
list($source_image_width, $source_image_height, $source_image_type) = getimagesize($source_image_path);
switch ($source_image_type) {
case IMAGETYPE_GIF:
$source_gd_image = imagecreatefromgif($source_image_path);
break;
case IMAGETYPE_JPEG:
$source_gd_image = imagecreatefromjpeg($source_image_path);
break;
case IMAGETYPE_PNG:
$source_gd_image = imagecreatefrompng($source_image_path);
break;
}
if ($source_gd_image === false) {
return false;
}
$source_aspect_ratio = $source_image_width / $source_image_height;
$thumbnail_aspect_ratio = THUMBNAIL_IMAGE_MAX_WIDTH / THUMBNAIL_IMAGE_MAX_HEIGHT;
if ($source_image_width <= THUMBNAIL_IMAGE_MAX_WIDTH && $source_image_height <= THUMBNAIL_IMAGE_MAX_HEIGHT) {
$thumbnail_image_width = $source_image_width;
$thumbnail_image_height = $source_image_height;
} elseif ($thumbnail_aspect_ratio > $source_aspect_ratio) {
$thumbnail_image_width = (int) (THUMBNAIL_IMAGE_MAX_HEIGHT * $source_aspect_ratio);
$thumbnail_image_height = THUMBNAIL_IMAGE_MAX_HEIGHT;
} else {
$thumbnail_image_width = THUMBNAIL_IMAGE_MAX_WIDTH;
$thumbnail_image_height = (int) (THUMBNAIL_IMAGE_MAX_WIDTH / $source_aspect_ratio);
}
$thumbnail_gd_image = imagecreatetruecolor($thumbnail_image_width, $thumbnail_image_height);
imagecopyresampled($thumbnail_gd_image, $source_gd_image, 0, 0, 0, 0, $thumbnail_image_width, $thumbnail_image_height, $source_image_width, $source_image_height);
imagejpeg($thumbnail_gd_image, $thumbnail_image_path, 90);
imagedestroy($source_gd_image);
imagedestroy($thumbnail_gd_image);
unset($source_image_width, $source_image_height, $source_image_type, $source_gd_image, $source_aspect_ratio, $thumbnail_aspect_ratio, $thumbnail_image_width, $thumbnail_image_height, $thumbnail_gd_image);
return true;
}
$regex = <<<'END'
/
(
(?: [\x00-\x7F] # single-byte sequences 0xxxxxxx
| [\xC0-\xDF][\x80-\xBF] # double-byte sequences 110xxxxx 10xxxxxx
| [\xE0-\xEF][\x80-\xBF]{2} # triple-byte sequences 1110xxxx 10xxxxxx * 2
| [\xF0-\xF7][\x80-\xBF]{3} # quadruple-byte sequence 11110xxx 10xxxxxx * 3
)+ # ...one or more times
)
| ( [\x80-\xBF] ) # invalid byte in range 10000000 - 10111111
| ( [\xC0-\xFF] ) # invalid byte in range 11000000 - 11111111
/x
END;
function utf8replacer($captures) {
if ($captures[1] != "") {
// Valid byte sequence. Return unmodified.
return $captures[1];
}
elseif ($captures[2] != "") {
// Invalid byte of the form 10xxxxxx.
// Encode as 11000010 10xxxxxx.
return "\xC2".$captures[2];
}
else {
// Invalid byte of the form 11xxxxxx.
// Encode as 11000011 10xxxxxx.
return "\xC3".chr(ord($captures[3])-64);
}
}
/* download file from source */
function getDataCSV(){
$thefile = 'http://feeds.cnv.com/xxxxxxxxxxxxxx/Bronze/ELD-B01.csv';
$file = 'ELD-B01.csv';
$fh = fopen($file, "w");
$rows = file($thefile);
foreach($rows as $num => $row){
if($num != 0){
fwrite($fh, $row);
}
}
fclose($fh);
include("DataSource.php");
$csv = new File_CSV_DataSource;
if ($csv->load($file)) {
$items = $csv->getHeaders();
$csv->getColumn($items[2]);
if ($csv->isSymmetric()) {
$items = $csv->connect();
} else {
$items = $csv->getAsymmetricRows();
}
$items = $csv->getrawArray();
}
unset($csv);
return $items;
}
$iter = new ArrayIterator(getDataCSV());
$google_list = array();
$google_list[] = array('id', 'title', 'description', 'google_product_category', 'product_type', 'link', 'image_link', 'condition', 'availability', 'price', 'brand', 'mpn');
$sql = "TRUNCATE TABLE ".SHOP_ITEMS;
$db->sql_query($sql);
foreach($iter as $item){
if($item[12] != ""){
$catName = str_replace(" ", "-", str_replace("and ", "", str_replace(",", "", str_replace("&", "and", str_replace("-", "", $item[12])))));
}else{
$catName = str_replace(" ", "-", str_replace("and ", "", str_replace(",", "", str_replace("&", "and", str_replace("-", "", $item[11])))));
}
$sql = 'SELECT cat_id FROM '.SHOP_CATS.' WHERE cat_clean = "'.$catName.'"';
$result = $db->sql_query($sql);
$row = $db->sql_fetchrow($result);
$db->sql_freeresult($result);
$catId = $row['cat_id'];
$img = $item[9];
$ext = substr($img, strrpos($img, '.') + 1);
$image = 'images/products/'.$item[0].'.'.$ext;
file_put_contents($root_path.$image, file_get_contents($img));
$thumb = "images/products/thumbs/".$item[0]."_thumb.".$ext;
generate_thumb($root_path.$image, $thumb);
$itmRow = array(
'item_name' => str_replace("/", "", preg_replace_callback($regex, "utf8replacer", html_entity_decode(html_entity_decode($item[1], ENT_QUOTES)))),
'item_price' => $item[2],
'item_description' => preg_replace_callback($regex, "utf8replacer", html_entity_decode(html_entity_decode($item[4], ENT_QUOTES))),
'item_model' => $item[0],
'item_manufacturer' => ($item[6] == '') ? 'No Info' : $item[6],
'item_image' => $image,
'item_cat' => ($catId) ? $catId : 0,
'item_number' => $item[0],
'item_vendor_code' => "ELD",
'item_stock' => (strtolower($item[5]) == 'in stock') ? 1 : 0,
'item_added' => strtotime($item[8]),
'item_upc' => ($item[13] == '') ? 'No Info' : $item[13],
'item_url' => '',
'item_weight' => ($item[14] == '') ? 'No Info' : $item[14],
);
$sql = 'INSERT INTO '.SHOP_ITEMS.' '.$db->sql_build_array('INSERT', $itmRow);
$db->sql_query($sql);
$itmId = $db->sql_nextid();
if(strstr($itmRow['item_name'], "-") == FALSE){
$seo = urlencode(str_replace(" ", "-", $itmRow['item_name'])).".html";
}else{
$seo = urlencode(str_replace(" ", "_", $itmRow['item_name'])).".html";
}
if($item[5] == "oos"){
$stock = "Out of Stock";
}else{
$stock = "In Stock";
}
$u_product = "https://therealmsofwickedry.com/product/".$seo;
$google_list[] = array($itmId, $itmRow['item_name'], $itmRow['item_description'], 'Mature > Erotic', $catName, $u_product, "https://therealmsofwickedry.com/".$itmRow['item_image'], "new", $stock, $itmRow['item_price'], $itmRow['item_manufacturer'], $itmRow['item_model']);
unset($catName, $catId, $img, $ext, $image, $thumb, $itmRow, $itmId, $seo, $stock, $u_product);
}
$line = '';
foreach($google_list as $list){
$line .= implode("\t", $list);
$line .= "\n";
}
$google = 'google_products.txt';
$h = fopen($google, "w");
fwrite($h, $line);
fclose($h);
?>

Tanzeel is correct in assuming the file is being read in it's entirety into memory.
Here is how you can read a file line by line.
$file_handle = fopen($file, 'r');
// You can ignore the file header line if you know the format.
$first_line = fgetcsv($fh);
while ($single_line = fgetcsv($file_handle)) {
print_r($single_line);
}
fclose($single_line);

I am not sure it's memory leakage, it must be an "out of memory" exception. My guess is that your script must be dying when reading the large file. When reading through your code I found the following:
$rows = file($thefile);
This code line will read the entire "large file" into an array in memory. The first step should be ensuring that your script isn't dying due to this. You can try using fopen and fread functions in PHP to read byte chunks and write into the destination file. This should ideally take care of hogging memory resources when reading.
To diagnose if getDataCSV() is the actual culprit modify the following line:
$iter = new ArrayIterator(getDataCSV());
in your code to this:
$iter = new ArrayIterator(getDataCSV());
die('I died after getDataCSV. There is another culprit somewhere else causing the script to fail!');
If you get the die message on your browser then you should start looking at other places in your code which can kill the script.
I haven't gone thoroughly through your code but you should also ensure you follow the same process of reading chunks of the file when processing it locally. For e.g. once your file is downloaded you will be processing it to generate some data. You may use arrays and loops to achieve it but since the data to process is large you should still be processing partial chunks of the file instead of dumping it all into memory.

Turns out that the utf8replacer() function was the cause of the issue. Thanks for the input, though :)

Related

Predis. How to set Cyrillic key?

I trying to execute next command.
Redis::hincrby('sentiment_combined:positive', 'рыжий кот', 1);
This command works perfectly for latin keys, for example 'orange cat'. But with 'рыжий кот' I have next error:
[Predis\Response\ServerException]
ERR Protocol error: expected '$', got '�' <
I has added log into Predis Predis\Connection\StreamConnection::write()
print_r($buffer);echo "---$written---\n";
And I observe output in console:
*2
$6
SELECT
$1
0
---23---
*4
$7
HINCRBY
$27
sentiment_combined:positive
$9
рыжий кот
$1
1
---81---
Redis supporting any keys. How to overcome this limitation in Predis?

Problem solved here: https://github.com/nrk/predis/issues/328
Reason in mbstring.func_overload = 6 in php.ini. Must be mbstring.func_overload = 0.

use Predis\Response\Status as StatusResponse;
class MbStreamConnection extends \Predis\Connection\StreamConnection
{
protected function write($buffer)
{
$socket = $this->getResource();
$buffer = iconv('utf-8', 'windows-1251', $buffer);
while (($length = mb_strlen($buffer, '8bit')) > 0)
{
$written = #fwrite($socket, $buffer, $length);
if ($length === $written) {
return;
}
if ($written === false) {
$this->onConnectionError('Error while writing bytes to the server');
}
$buffer = substr($buffer, $written);
}
return;
}
/**
* {#inheritdoc}
*/
public function read()
{
$socket = $this->getResource();
$chunk = fgets($socket);
$chunk = iconv('windows-1251', 'utf-8', $chunk);
if ($chunk === false || $chunk === '') {
$this->onConnectionError('Error while reading line from the server.');
}
$prefix = $chunk[0];
$payload = substr($chunk, 1, -2);
switch ($prefix) {
case '+':
return StatusResponse::get($payload);
case '$':
$size = (int) $payload;
if ($size === -1) {
return;
}
$bulkData = '';
$bytesLeft = ($size += 2);
do {
$chunk = fread($socket, min($bytesLeft, 4096));
if ($chunk === false || $chunk === '') {
$this->onConnectionError('Error while reading bytes from the server.');
}
$bulkData .= $chunk;
$bytesLeft = $size - mb_strlen($bulkData, '8bit');
} while ($bytesLeft > 0);
$tmp = mb_substr($bulkData, 0, -2);
$tmp = iconv('windows-1251', 'utf-8', $tmp);
return $tmp;
case '*':
$count = (int) $payload;
if ($count === -1) {
return;
}
$multibulk = array();
for ($i = 0; $i < $count; ++$i) {
$multibulk[$i] = $this->read();
}
return $multibulk;
case ':':
$integer = (int) $payload;
return $integer == $payload ? $integer : $payload;
case '-':
return new ErrorResponse($payload);
default:
$this->onProtocolError("Unknown response prefix: '$prefix'.");
return;
}
}
}
in connections params use MbStreamConnection
$client = new \Predis\Client('tcp://localhost:6379', [
'scheme' => 'tcp',
'host' => 'localhost',
'port' => 6379,
'connections' => [
'tcp' => 'MbStreamConnection'
],
'parameters' => [
'password' => '',
]
]);

Multi-threading a for statement with php

I'm using this following function to check if images exist at their location. Each time the script runs it load about 40 - 50 urls and so its taking long time to load the page. I was thinking of using threading for the "for statement" (at the end of the script) but couldn't find many examples on how to do that. I'm not very familiar with multi-threading with php but i found an example here using popen.
My script:
function get_image_dim($sURL) {
try {
$hSock = # fopen($sURL, 'rb');
if ($hSock) {
while(!feof($hSock)) {
$vData = fread($hSock, 300);
break;
}
fclose($hSock);
if (strpos(' ' . $vData, 'JFIF')>0) {
$vData = substr($vData, 0, 300);
$asResult = unpack('H*',$vData);
$sBytes = $asResult[1];
$width = 0;
$height = 0;
$hex_width = '';
$hex_height = '';
if (strstr($sBytes, 'ffc2')) {
$hex_height = substr($sBytes, strpos($sBytes, 'ffc2') + 10, 4);
$hex_width = substr($sBytes, strpos($sBytes, 'ffc2') + 14, 4);
} else {
$hex_height = substr($sBytes, strpos($sBytes, 'ffc0') + 10, 4);
$hex_width = substr($sBytes, strpos($sBytes, 'ffc0') + 14, 4);
}
$width = hexdec($hex_width);
$height = hexdec($hex_height);
return array('width' => $width, 'height' => $height);
} elseif (strpos(' ' . $vData, 'GIF')>0) {
$vData = substr($vData, 0, 300);
$asResult = unpack('h*',$vData);
$sBytes = $asResult[1];
$sBytesH = substr($sBytes, 16, 4);
$height = hexdec(strrev($sBytesH));
$sBytesW = substr($sBytes, 12, 4);
$width = hexdec(strrev($sBytesW));
return array('width' => $width, 'height' => $height);
} elseif (strpos(' ' . $vData, 'PNG')>0) {
$vDataH = substr($vData, 22, 4);
$asResult = unpack('n',$vDataH);
$height = $asResult[1];
$vDataW = substr($vData, 18, 4);
$asResult = unpack('n',$vDataW);
$width = $asResult[1];
return array('width' => $width, 'height' => $height);
}
}
} catch (Exception $e) {}
return FALSE;
}
for($y=0;$y<= ($image_count-1);$y++){
$dim = get_image_dim($images[$y]);
if (empty($dim)) {
echo $images[$y];
unset($images[$y]);
}
}
$images = array_values($images);
The popen example i found was:
for ($i=0; $i<10; $i++) {
// open ten processes
for ($j=0; $j<10; $j++) {
$pipe[$j] = popen('script.php', 'w');
}
// wait for them to finish
for ($j=0; $j<10; ++$j) {
pclose($pipe[$j]);
}
}
I'm not sure which part of my code has to go in the script.php? I tried moving the whole script but that didn't work?
Any ideas on how can i implement this or if there is a better way to multi thread it? Thanks.

PHP does not have multi-threading natively. You can do it with pthreads, but having a little experience there, I can say with assurance that that is too much for your needs.
Your best bet will be to use curl, you can initiate multiple requests with curl_multi_init. Based off the example on PHP.net, the following may work for your needs:
function curl_multi_callback(Array $urls, $callback, $cache_dir = NULL, $age = 600) {
$return = array();
$conn = array();
$max_age = time()-intval($age);
$mh = curl_multi_init();
if(is_dir($cache_dir)) {
foreach($urls as $i => $url) {
$cache_path = $cache_dir.DIRECTORY_SEPARATOR.sha1($url).'.ser';
if(file_exists($cache_path)) {
$stat = stat($cache_path);
if($stat['atime'] > $max_age) {
$return[$i] = unserialize(file_get_contents($cache_path));
unset($urls[$i]);
} else {
unlink($cache_path);
}
}
}
}
foreach ($urls as $i => $url) {
$conn[$i] = curl_init($url);
curl_setopt($conn[$i], CURLOPT_RETURNTRANSFER, 1);
curl_multi_add_handle($mh, $conn[$i]);
}
do {
$status = curl_multi_exec($mh, $active);
// Keep attempting to get info so long as we get info
while (($info = curl_multi_info_read($mh)) !== FALSE) {
// We received information from Multi
if (false !== $info) {
// The connection was successful
$handle = $info['handle'];
// Find the index of the connection in `conn`
$i = array_search($handle, $conn);
if($info['result'] === CURLE_OK) {
// If we found an index and that index is set in the `urls` array
if(false !== $i && isset($urls[$i])) {
$content = curl_multi_getcontent($handle);
$return[$i] = $data = array(
'url' => $urls[$i],
'content' => $content,
'parsed' => call_user_func($callback, $content, $urls[$i]),
);
if(is_dir($cache_dir)) {
file_put_contents($cache_dir.DIRECTORY_SEPARATOR.sha1($urls[$i]).'.ser', serialize($data));
}
}
} else {
// Handle failures how you will
}
// Close, even if a failure
curl_multi_remove_handle($mh, $handle);
unset($conn[$i]);
}
}
} while ($status === CURLM_CALL_MULTI_PERFORM || $active);
// Cleanup and resolve any remaining connections (unlikely)
if(!empty($conn)) {
foreach ($conn as $i => $handle) {
if(isset($urls[$i])) {
$content = curl_multi_getcontent($handle);
$return[$i] = $data = array(
'url' => $urls[$i],
'content' => $content,
'parsed' => call_user_func($callback, $content, $urls[$i]),
);
if(is_dir($cache_dir)) {
file_put_contents($cache_dir.DIRECTORY_SEPARATOR.sha1($urls[$i]).'.ser', serialize($data));
}
}
curl_multi_remove_handle($mh, $handle);
unset($conn[$i]);
}
}
curl_multi_close($mh);
return $return;
}
$return = curl_multi_callback($urls, function($data, $url) {
echo "got $url\n";
return array('some stuff');
}, '/tmp', 30);
//print_r($return);
/*
$url_dims = array(
'url' => 'http://www......',
'content' => raw content
'parsed' => return of get_image_dim
)
*/
Just restructure your original function get_image_dim to consume the raw data and output whatever you are looking for.
This is not a complete function, there may be errors, or idiosyncrasies you need to resolve, but it should serve as a good starting point.
Updated to include caching. This changed a test I was running on 18 URLS from 1 second, to .007 seconds (with cache hits).
Note: you may want to not cache the full request contents, as I did, and just cache the url and the parsed data.

looping thru csv file only iterates once with feof and fgetcsv

Im parsing a csv file to get certain fields and modify them but the problem is that for some reason the feof only iterates once. I did some testing and I realized that if I remove the fgetcsv line the file is read until the end of file. Herebelow is my code. Any help would be greatly appreciated.
<?php
include 'property-features.php';
//------------- get lat and long --------------//
function geocode($address){
// url encode the address
$address = urlencode($address);
$json = file_get_contents('http://open.mapquestapi.com/geocoding/v1/address?key={mykey}&location='.$address);
$jsonArr = json_decode($json);
$lat1 = $jsonArr->results[0]->locations[0]->latLng->lat;
$lon1 = $jsonArr->results[0]->locations[0]->latLng->lng;
// verify if data is complete
if($lat1 && $lon1){
// put the data in the array
$data_arr = array();
array_push(
$data_arr,
$lat1,
$lon1,
$address
);
return $data_arr;
}else{
return false;
}
}
/* ------------- fix property address --------------- */
function ordinal($num) {
$ones = $num % 10;
$tens = floor($num / 10) % 10;
if ($tens == 1) {
$suff = "th";
} else {
switch ($ones) {
case 1 : $suff = "st"; break;
case 2 : $suff = "nd"; break;
case 3 : $suff = "rd"; break;
default : $suff = "th";
}
}
return $num . $suff;
}
/* ------------------ Open original mls feed csv and create a csv file ------------------*/
$file_handle = fopen("sefl_data.csv", "r");
$file = fopen("/home/javy1103/public_html/wp-content/uploads/wpallimport/files/mlsFeed.csv", "w");
while (!feof($file_handle)) {
echo "string";
$line_of_text = fgetcsv($file_handle);
$photos = intval($line_of_text[88]);
if(1 == 1){
/* ------------------ get parking spaces ------------------*/
$line_of_text[84] = str_replace($patterns, $replacement, $line_of_text[84]);
if(substr_count($line_of_text[84], "1 parking") || substr_count($line_of_text[84], "1 car garage")){
$line_of_text[95] = 1;
}else if (substr_count($line_of_text[84], "2 parking") || substr_count($line_of_text[84], "2 car garage")){
$line_of_text[95] = 2;
}else if (substr_count($line_of_text[84], "3 parking") || substr_count($line_of_text[84], "3 car garage")){
$line_of_text[95] = 3;
}else if(substr_count($line_of_text[84], "3 parking or more parking") || substr_count($line_of_text[84], "3 or more car")) {
$line_of_text[95] = "3+";
}
/* ---------------- Get latitude and longitude -------------------*/
if(!empty($line_of_text[23])){
$stNum = preg_replace("/[^0-9]/","",$line_of_text[21]);
echo $stNum.'<BR>';
$address = $line_of_text[20].' '.$line_of_text[23].' '.$line_of_text[21].','.$line_of_text[27].',FL,'.$line_of_text[29];
}else {$address = $line_of_text[20].' '.$line_of_text[21].','.$line_of_text[27].',FL,'.$line_of_text[29];}
//$latLong = geocode($address);
//$line_of_text[25] = $latLong[0].', '.$latLong[1];
$line_of_text[96] = "";
$counter = 2;
//unset($line);
$url = $line_of_text[89];
//$line[0] = $url;
while ($counter <= $photos && $counter < 15) {
$photoNumber = '_'.($counter).'.jpg';
$line_of_text[96+$counter] = substr_replace($url, $photoNumber, sizeof($url) - 5, sizeof($photos)+4);
$counter++;
}
}
}
fclose($file_handle);
fclose($file);
?>

Hazarding a guess, and quoting from the PHP docs
Note: If PHP is not properly recognizing the line endings when reading files either on or created by a Macintosh computer, enabling the auto_detect_line_endings run-time configuration option may help resolve the problem.

Why is this PHP script working in one subdomain and not another identical one? [duplicate]

This question already has an answer here:
Why identical PHP script might work in one subdomain and not another?
(1 answer)
Closed 8 years ago.
I have a php script that creates WordPress posts from a csv file ("file.csv") that is on the same subdomain as WordPress. This has been working for months, however, I just uploaded a new "file.csv" file to a couple of subdomains and the script is not working, resulting in a blank screen and does not create posts.
To troubleshoot, I checked other subdomains where I have the php script set up and uploaded the new "file.csv" file. It worked there.
So, on some subdomains the script is working and on some it is not. The WordPress installs are identical. The php script is identical, I downloaded and uploaded it from one domain to the other to troubleshoot. It works on one and not the other. I have tried identical "file.csv", still works on one subdomain and not the other.
The below error comes up in the error logs
[17-Nov-2013 11:00:05] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 8388608 bytes) in /filepath/_adder.php on line 16
However the "file.csv" file is identical in both installs, and is the same size. But it still works in one and not the other.
Why might the script be working correctly on done subdomain and not the other? Any suggestions, things to try or tips would be very much appreciated.
For sake of completeness, below is the php script in question.
<?php
require_once('wp-config.php');
$siteurl = get_site_url();
function clearer($str) {
//$str = iconv("UTF-8", "UTF-8//IGNORE", $str);
$str = utf8_encode($str);
$str = str_replace("’", "'", $str);
$str = str_replace("–", "-", $str);
return htmlspecialchars($str);
}
//file read
if(file_exists("file.csv")) $csv_lines = file("file.csv");
if(is_array($csv_lines)) {
$cnt = 15;
for($i = 0; $i < $cnt; $i++) {
$line = $csv_lines[$i];
$line = trim($line);
$first_char = true;
$col_num = 0;
$length = strlen($line);
for($b = 0; $b < $length; $b++) {
if($skip_char != true) {
$process = true;
if($first_char == true) {
if($line[$b] == '"') {
$terminator = '",';
$process = false;
}else
$terminator = ',';
$first_char = false;
}
if($line[$b] == '"'){
$next_char = $line[$b + 1];
if($next_char == '"')
$skip_char = true;
elseif($next_char == ',') {
if($terminator == '",') {
$first_char = true;
$process = false;
$skip_char = true;
}
}
}
if($process == true){
if($line[$b] == ',') {
if($terminator == ',') {
$first_char = true;
$process = false;
}
}
}
if($process == true)
$column .= $line[$b];
if($b == ($length - 1)) {
$first_char = true;
}
if($first_char == true) {
$values[$i][$col_num] = $column;
$column = '';
$col_num++;
}
}
else
$skip_char = false;
}
}
$values = array_values($values);
//print_r($values);
/*************************************************/
if(is_array($values)) {
//file.csv read
for($i = 0; $i < count($values); $i++) {
unset($post);
//check duplicate
//$wpdb->show_errors();
$wpdb->query("SELECT `ID` FROM `" . $wpdb->prefix . "posts`
WHERE `post_title` = '".clearer($values[$i][0])."' AND `post_status` = 'publish'");
//echo $wpdb->num_rows;
if($values[$i][0] != "Name" && $values[$i][0] != "" && $wpdb->num_rows == 0) {
$post['name'] = clearer($values[$i][0]);
$post['Address'] = clearer($values[$i][1]);
$post['City'] = clearer($values[$i][2]);
$post['Categories'] = $values[$i][3];
$post['Tags'] = $values[$i][4];
$post['Top_image'] = $values[$i][5];
$post['Body_text'] = clearer($values[$i][6]);
//details
for($k = 7; $k <= 56; $k++) {
$values[$i][$k] != '' ? $post['details'] .= "<em>".clearer($values[$i][$k])."</em>\r\n" : '';
}
//cats
$categoryes = explode(";", $post['Categories']);
foreach($categoryes AS $category_name) {
$term = term_exists($category_name, 'category');
if (is_array($term)) {
//category exist
$cats[] = $term['term_id'];
}else{
//add category
wp_insert_term( $category_name, 'category' );
$term = term_exists($category_name, 'category');
$cats[] = $term['term_id'];
}
}
//top image
if($post['Top_image'] != "") {
$im_name = md5($post['Top_image']).'.jpg';
$im = #imagecreatefromjpeg($post['Top_image']);
if ($im) {
imagejpeg($im, ABSPATH.'images/'.$im_name);
$post['topimage'] = '<img class="alignnone size-full" src="'.$siteurl.'/images/'.$im_name.'" alt="" />';
}
}
//bottom images
for($k = 57; $k <= 76; $k++) {
if($values[$i][$k] != '') {
$im_name = md5($values[$i][$k]).'.jpg';
$im = #imagecreatefromjpeg($values[$i][$k]);
if ($im) {
imagejpeg($im, ABSPATH.'images/'.$im_name);
$post['images'] .= '<a href="'.$siteurl.'/images/'.$im_name.'"><img class="alignnone size-full"
src="'.$siteurl.'/images/'.$im_name.'" alt="" /></a>';
}
}
}
$post = array_map( 'stripslashes_deep', $post );
//print_r($post);
//post created
$my_post = array (
'post_title' => $post['name'],
'post_content' => '
<em>Address: '.$post['Address'].'</em>
'.$post['topimage'].'
'.$post['Body_text'].'
<!--more-->
'.$post['details'].'
'.$post['images'].'
',
'post_status' => 'publish',
'post_author' => 1,
'post_category' => $cats
);
unset($cats);
//add post
//echo "ID:" .
$postid = wp_insert_post($my_post); //post ID
//tags
wp_set_post_tags( $postid, str_replace(';',',',$post['Tags']), true ); //tags
echo $post['name']. ' - added. ';
//google coords
$address = preg_replace("!\((.*?)\)!si", " ", $post['Address']).', '.$post['City'];
$json = json_decode(file_get_contents('http://hicon.by/temp/googlegeo.php?address='.urlencode($address)));
//print_r($json);
if($json->status == "OK") {
//нашло адрес
$google['status'] = $json->status;
$params = $json->results[0]->address_components;
if(is_array($params)) {
foreach($params AS $id => $p) {
if($p->types[0] == 'locality') $google['locality_name'] = $p->short_name;
if($p->types[0] == 'administrative_area_level_2') $google['sub_admin_code'] = $p->short_name;
if($p->types[0] == 'administrative_area_level_1') $google['admin_code'] = $p->short_name;
if($p->types[0] == 'country') $google['country_code'] = $p->short_name;
if($p->types[0] == 'postal_code') $google['postal_code'] = $p->short_name;
}
}
$google['address'] = $json->results[0]->formatted_address;
$google['location']['lat'] = $json->results[0]->geometry->location->lat;
$google['location']['lng'] = $json->results[0]->geometry->location->lng;
//print_r($params);
//print_r($google);
//insert into DB
$insert_code = $wpdb->insert( $wpdb->prefix . 'geo_mashup_locations',
array( 'lat' => $google['location']['lat'], 'lng' =>
$google['location']['lng'], 'address' => $google['address'],
'saved_name' => $post['name'], 'postal_code' => $google['postal_code'],
'country_code' => $google['country_code'], 'admin_code' =>
$google['admin_code'],
'sub_admin_code' => $google['sub_admin_code'], 'locality_name' =>
$google['locality_name'] ),
array( '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s' )
);
if($insert_code) {
$google_code_id = $wpdb->insert_id;
$geo_date = date( 'Y-m-d H:i:s' );
$wpdb->insert(
$wpdb->prefix . 'geo_mashup_location_relationships',
array( 'object_name' => 'post', 'object_id' => $postid, 'location_id' => $google_code_id, 'geo_date' =>
$geo_date ),
array( '%s', '%s', '%s', '%s' )
);
}else{
//can't insert data
}
echo ' address added.<br />';
}else{
//echo $json->status;
}
}
} //$values end (for)
}
}else{
//not found file.csv
echo 'not found file.csv';
}
$input = explode("\n", file_get_contents("file.csv"));
foreach ($input as $line) {
// process all lines.
}
// This function removes first $CNT elements.
// More info:
// http://php.net/manual/en/function.array-slice.php
$output = array_slice($input, $cnt);
file_put_contents("file.csv", implode("\n", $output));
?>
<html>
<body>
<form enctype="multipart/form-data" method="post">
CSV: <input name="file" type="file" />
<input type="submit" value="Send File" />
</form>
</body>
</html>

Your imported file is larger than limit allowed on the server. You are importing the large chuck of data and storing in Memory that is causing crashing. There are two ways to stop crashing.
1.You can put at the top of you code before any other thing written which give larger access to memory.
ini_set('memory_limit','256M');
2.You read the file line by line and process according to that.
My suggestion is to go for 2nd way and there are two reasons for suggestion.
As what will you do if the size increases beyond possible to keep the file memory. Either you have to split it or go for second option, so why don't you start with second options rather using at later stage.
Second Reason is Memory Allocation and Speed of Execution. If you adopt the second option the Memory will be less occupied and your program will execute faster than the reading full file in Memory then doing work on it.

How to get directory size in PHP

function foldersize($path) {
$total_size = 0;
$files = scandir($path);
foreach($files as $t) {
if (is_dir(rtrim($path, '/') . '/' . $t)) {
if ($t<>"." && $t<>"..") {
$size = foldersize(rtrim($path, '/') . '/' . $t);
$total_size += $size;
}
} else {
$size = filesize(rtrim($path, '/') . '/' . $t);
$total_size += $size;
}
}
return $total_size;
}
function format_size($size) {
$mod = 1024;
$units = explode(' ','B KB MB GB TB PB');
for ($i = 0; $size > $mod; $i++) {
$size /= $mod;
}
return round($size, 2) . ' ' . $units[$i];
}
$SIZE_LIMIT = 5368709120; // 5 GB
$sql="select * from users order by id";
$result=mysql_query($sql);
while($row=mysql_fetch_array($result)) {
$disk_used = foldersize("C:/xampp/htdocs/freehosting/".$row['name']);
$disk_remaining = $SIZE_LIMIT - $disk_used;
print 'Name: ' . $row['name'] . '<br>';
print 'diskspace used: ' . format_size($disk_used) . '<br>';
print 'diskspace left: ' . format_size($disk_remaining) . '<br><hr>';
}
php disk_total_space
Any idea why the processor usage shoot up too high or 100% till the script execution is finish ? Can anything be done to optimize it? or is there any other alternative way to check folder and folders inside it size?

function GetDirectorySize($path){
$bytestotal = 0;
$path = realpath($path);
if($path!==false && $path!='' && file_exists($path)){
foreach(new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path, FilesystemIterator::SKIP_DOTS)) as $object){
$bytestotal += $object->getSize();
}
}
return $bytestotal;
}
The same idea as Janith Chinthana suggested.
With a few fixes:
Converts $path to realpath
Performs iteration only if path is valid and folder exists
Skips . and .. files
Optimized for performance

The following are other solutions offered elsewhere:
If on a Windows Host:
<?
$f = 'f:/www/docs';
$obj = new COM ( 'scripting.filesystemobject' );
if ( is_object ( $obj ) )
{
$ref = $obj->getfolder ( $f );
echo 'Directory: ' . $f . ' => Size: ' . $ref->size;
$obj = null;
}
else
{
echo 'can not create object';
}
?>
Else, if on a Linux Host:
<?
$f = './path/directory';
$io = popen ( '/usr/bin/du -sk ' . $f, 'r' );
$size = fgets ( $io, 4096);
$size = substr ( $size, 0, strpos ( $size, "\t" ) );
pclose ( $io );
echo 'Directory: ' . $f . ' => Size: ' . $size;
?>

directory size using php filesize and RecursiveIteratorIterator.
This works with any platform which is having php 5 or higher version.
/**
* Get the directory size
* #param string $directory
* #return integer
*/
function dirSize($directory) {
$size = 0;
foreach(new RecursiveIteratorIterator(new RecursiveDirectoryIterator($directory)) as $file){
$size+=$file->getSize();
}
return $size;
}

A pure php example.
<?php
$units = explode(' ', 'B KB MB GB TB PB');
$SIZE_LIMIT = 5368709120; // 5 GB
$disk_used = foldersize("/webData/users/vdbuilder#yahoo.com");
$disk_remaining = $SIZE_LIMIT - $disk_used;
echo("<html><body>");
echo('diskspace used: ' . format_size($disk_used) . '<br>');
echo( 'diskspace left: ' . format_size($disk_remaining) . '<br><hr>');
echo("</body></html>");
function foldersize($path) {
$total_size = 0;
$files = scandir($path);
$cleanPath = rtrim($path, '/'). '/';
foreach($files as $t) {
if ($t<>"." && $t<>"..") {
$currentFile = $cleanPath . $t;
if (is_dir($currentFile)) {
$size = foldersize($currentFile);
$total_size += $size;
}
else {
$size = filesize($currentFile);
$total_size += $size;
}
}
}
return $total_size;
}
function format_size($size) {
global $units;
$mod = 1024;
for ($i = 0; $size > $mod; $i++) {
$size /= $mod;
}
$endIndex = strpos($size, ".")+3;
return substr( $size, 0, $endIndex).' '.$units[$i];
}
?>

function get_dir_size($directory){
$size = 0;
$files = glob($directory.'/*');
foreach($files as $path){
is_file($path) && $size += filesize($path);
is_dir($path) && $size += get_dir_size($path);
}
return $size;
}

Thanks to Jonathan Sampson, Adam Pierce and Janith Chinthana I did this one checking for most performant way to get the directory size. Should work on Windows and Linux Hosts.
static function getTotalSize($dir)
{
$dir = rtrim(str_replace('\\', '/', $dir), '/');
if (is_dir($dir) === true) {
$totalSize = 0;
$os = strtoupper(substr(PHP_OS, 0, 3));
// If on a Unix Host (Linux, Mac OS)
if ($os !== 'WIN') {
$io = popen('/usr/bin/du -sb ' . $dir, 'r');
if ($io !== false) {
$totalSize = intval(fgets($io, 80));
pclose($io);
return $totalSize;
}
}
// If on a Windows Host (WIN32, WINNT, Windows)
if ($os === 'WIN' && extension_loaded('com_dotnet')) {
$obj = new \COM('scripting.filesystemobject');
if (is_object($obj)) {
$ref = $obj->getfolder($dir);
$totalSize = $ref->size;
$obj = null;
return $totalSize;
}
}
// If System calls did't work, use slower PHP 5
$files = new \RecursiveIteratorIterator(new \RecursiveDirectoryIterator($dir));
foreach ($files as $file) {
$totalSize += $file->getSize();
}
return $totalSize;
} else if (is_file($dir) === true) {
return filesize($dir);
}
}

Even though there are already many many answers to this post, I feel I have to add another option for unix hosts that only returns the sum of all file sizes in the directory (recursively).
If you look at Jonathan's answer he uses the du command. This command will return the total directory size but the pure PHP solutions posted by others here will return the sum of all file sizes. Big difference!
What to look out for
When running du on a newly created directory, it may return 4K instead of 0. This may even get more confusing after having deleted files from the directory in question, having du reporting a total directory size that does not correspond to the sum of the sizes of the files within it. Why? The command du returns a report based on some file settings, as Hermann Ingjaldsson commented on this post.
The solution
To form a solution that behaves like some of the PHP-only scripts posted here, you can use ls command and pipe it to awk like this:
ls -ltrR /path/to/dir |awk '{print \$5}'|awk 'BEGIN{sum=0} {sum=sum+\$1} END {print sum}'
As a PHP function you could use something like this:
function getDirectorySize( $path )
{
if( !is_dir( $path ) ) {
return 0;
}
$path = strval( $path );
$io = popen( "ls -ltrR {$path} |awk '{print \$5}'|awk 'BEGIN{sum=0} {sum=sum+\$1} END {print sum}'", 'r' );
$size = intval( fgets( $io, 80 ) );
pclose( $io );
return $size;
}

I found this approach to be shorter and more compatible. The Mac OS X version of "du" doesn't support the -b (or --bytes) option for some reason, so this sticks to the more-compatible -k option.
$file_directory = './directory/path';
$output = exec('du -sk ' . $file_directory);
$filesize = trim(str_replace($file_directory, '', $output)) * 1024;
Returns the $filesize in bytes.

Johnathan Sampson's Linux example didn't work so good for me. Here's an improved version:
function getDirSize($path)
{
$io = popen('/usr/bin/du -sb '.$path, 'r');
$size = intval(fgets($io,80));
pclose($io);
return $size;
}

It works perfectly fine .
public static function folderSize($dir)
{
$size = 0;
foreach (glob(rtrim($dir, '/') . '/*', GLOB_NOSORT) as $each) {
$func_name = __FUNCTION__;
$size += is_file($each) ? filesize($each) : static::$func_name($each);
}
return $size;
}

There are several things you could do to optimise the script - but maximum success would make it IO-bound rather than CPU-bound:
Calculate rtrim($path, '/') outside the loop.
make if ($t<>"." && $t<>"..") the outer test - it doesn't need to stat the path
Calculate rtrim($path, '/') . '/' . $t once per loop - inside 2) and taking 1) into account.
Calculate explode(' ','B KB MB GB TB PB'); once rather than each call?

PHP get directory size (with FTP access)
After hard work, this code works great!!!! and I want to share with the community (by MundialSYS)
function dirFTPSize($ftpStream, $dir) {
$size = 0;
$files = ftp_nlist($ftpStream, $dir);
foreach ($files as $remoteFile) {
if(preg_match('/.*\/\.\.$/', $remoteFile) || preg_match('/.*\/\.$/', $remoteFile)){
continue;
}
$sizeTemp = ftp_size($ftpStream, $remoteFile);
if ($sizeTemp > 0) {
$size += $sizeTemp;
}elseif($sizeTemp == -1){//directorio
$size += dirFTPSize($ftpStream, $remoteFile);
}
}
return $size;
}
$hostname = '127.0.0.1'; // or 'ftp.domain.com'
$username = 'username';
$password = 'password';
$startdir = '/public_html'; // absolute path
$files = array();
$ftpStream = ftp_connect($hostname);
$login = ftp_login($ftpStream, $username, $password);
if (!$ftpStream) {
echo 'Wrong server!';
exit;
} else if (!$login) {
echo 'Wrong username/password!';
exit;
} else {
$size = dirFTPSize($ftpStream, $startdir);
}
echo number_format(($size / 1024 / 1024), 2, '.', '') . ' MB';
ftp_close($ftpStream);
Good code!
Fernando

Object Oriented Approach :
/**
* Returns a directory size
*
* #param string $directory
*
* #return int $size directory size in bytes
*
*/
function dir_size($directory)
{
$size = 0;
foreach(new RecursiveIteratorIterator(new RecursiveDirectoryIterator($directory)) as $file)
{
$size += $file->getSize();
}
return $size;
}
Fast and Furious Approach :
function dir_size2($dir)
{
$line = exec('du -sh ' . $dir);
$line = trim(str_replace($dir, '', $line));
return $line;
}

Code adjusted to access main directory and all sub folders within it. This would return the full directory size.
function get_dir_size($directory){
$size = 0;
$files= glob($directory.'/*');
foreach($files as $path){
is_file($path) && $size += filesize($path);
if (is_dir($path))
{
$size += get_dir_size($path);
}
}
return $size;
}

if you are hosted on Linux:
passthru('du -h -s ' . $DIRECTORY_PATH)
It's better than foreach

Regarding Johnathan Sampson's Linux example, watch out when you are doing an intval on the outcome of the "du" function, if the size is >2GB, it will keep showing 2GB.
Replace:
$totalSize = intval(fgets($io, 80));
by:
strtok(fgets($io, 80), " ");
supposed your "du" function returns the size separated with space followed by the directory/file name.

Just another function using native php functions.
function dirSize($dir)
{
$dirSize = 0;
if(!is_dir($dir)){return false;};
$files = scandir($dir);if(!$files){return false;}
$files = array_diff($files, array('.','..'));
foreach ($files as $file) {
if(is_dir("$dir/$file")){
$dirSize += dirSize("$dir/$file");
}else{
$dirSize += filesize("$dir/$file");
}
}
return $dirSize;
}
NOTE: this function returns the files sizes, NOT the size on disk

Evolved from Nate Haugs answer I created a short function for my project:
function uf_getDirSize($dir, $unit = 'm')
{
$dir = trim($dir, '/');
if (!is_dir($dir)) {
trigger_error("{$dir} not a folder/dir/path.", E_USER_WARNING);
return false;
}
if (!function_exists('exec')) {
trigger_error('The function exec() is not available.', E_USER_WARNING);
return false;
}
$output = exec('du -sb ' . $dir);
$filesize = (int) trim(str_replace($dir, '', $output));
switch ($unit) {
case 'g': $filesize = number_format($filesize / 1073741824, 3); break; // giga
case 'm': $filesize = number_format($filesize / 1048576, 1); break; // mega
case 'k': $filesize = number_format($filesize / 1024, 0); break; // kilo
case 'b': $filesize = number_format($filesize, 0); break; // byte
}
return ($filesize + 0);
}

A one-liner solution. Result in bytes.
$size=array_sum(array_map('filesize', glob("{$dir}/*.*")));
Added bonus: you can simply change the file mask to whatever you like, and count only certain files (eg by extension).

This is the simplest possible algorithm to find out directory size irrespective of the programming language you are using.
For PHP specific implementation. go to: Calculate Directory Size in PHP | Explained with Algorithm | Working Code

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

memory leak while processing large CSV - php

Turns out that the utf8replacer() function was the cause of the issue. Thanks for the input, though :)

Related

Predis. How to set Cyrillic key?

Multi-threading a for statement with php

looping thru csv file only iterates once with feof and fgetcsv

Why is this PHP script working in one subdomain and not another identical one? [duplicate]

How to get directory size in PHP

Categories

Resources