CSV import thousands of rows - php

I have multiple csv feeds to read with php 7.3.
When i have less than 500 lines (perhaps 1000) it works fine but with more i have
'PHP message: PHP Fatal error: Allowed memory size of 134217728 bytes
exhausted (tried to allocate 8192 bytes)
I read a lot of things about this common error
In php.ini, if I added
memory_limit=1024M
It is similar if I change it to -1.
Data comes from an api
You could see below the 4 main functions called
The error provides from the last (function parse - $lines[$index] = str_getcsv($line);)
Instead of getting the full file into a variable, i parse it for newlines and then do str_getcsv on each array element.
public function getProducts(Advertiser $advertiser): ProductCollection
{
$response = $this->getResponse($advertiser->getFeedUrl());
$content = gzdecode($response);
$products = $this->csvParser->parseProducts($content);
return $products;
}
private function getResponse(string $url): string
{
$cacheKey = sprintf('%s-%s',
static::CACHE_PREFIX,
md5($url)
);
if ($this->cache->has($cacheKey)) {
return $this->cache->get($cacheKey);
}
$response = $this->client->request(static::REQUEST_METHOD, $url);
$body = (string) $response->getBody();
$this->cache->set($cacheKey, $body);
return $body;
}
public function parseProducts(string $csvContent): ProductCollection
{
$lines = $this->parse($csvContent);
$keys = array_shift($lines);
return new ProductCollection($keys, $lines);
}
private function parse(string $content): array
{
$lines = str_getcsv($content, PHP_EOL);
foreach ($lines as $index => $line) {
$lines[$index] = str_getcsv($line);
}
return $lines;
}
So, i think it's due to parse functon but i don't know what to do

Related

How to use multiple async fread with Fibers in PHP?

I would to like to get contents from each url in a list using fread and Fibers where each stream does not need to wait a feof to run another fread in another url
My current code is the follow:
<?php
function getFiberFromStream($stream, $url): Fiber {
return new Fiber(function ($stream) use ($url): void {
while (!feof($stream)) {
echo "reading 100 bytes from $url".PHP_EOL;
$contents = fread($stream, 100);
Fiber::suspend($contents);
}
});
}
function getContents(array $urls): array {
$contents = [];
foreach ($urls as $key => $url) {
$stream = fopen($url, 'r');
stream_set_blocking($stream, false);
$fiber = getFiberFromStream($stream, $url);
$content = $fiber->start($stream);
while (!$fiber->isTerminated()) {
$content .= $fiber->resume();
}
fclose($stream);
$contents[$urls[$key]] = $content;
}
return $contents;
}
$urls = [
'https://www.google.com/',
'https://www.twitter.com',
'https://www.facebook.com'
];
var_dump(getContents($urls));
Unfortunatelly, the echo used in getFiberFromStream() are showing that this current code is waiting to get the entire content from a url to go to next one:
reading 100 bytes from https://www.google.com
reading 100 bytes from https://www.google.com
reading 100 bytes from https://www.google.com //finished
reading 100 bytes from https://www.twitter.com
reading 100 bytes from https://www.twitter.com
reading 100 bytes from https://www.twitter.com //finished
reading 100 bytes from https://www.facebook.com
[...]
I would like something like:
reading 100 bytes from https://www.google.com
reading 100 bytes from https://www.twitter.com
reading 100 bytes from https://www.facebook.com
reading 100 bytes from https://www.google.com
reading 100 bytes from https://www.twitter.com
reading 100 bytes from https://www.facebook.com
[...]
The behaviour you see is because you poll the current fiber till full completion before go onto next fiber.
Solution here is to start all fibers for all urls at once and only after that do poll them.
Try something like this:
function getContents(array $urls): array {
$contents = [];
$fibers = [];
// start them all up
foreach ($urls as $key => $url) {
$stream = fopen($url, 'r');
stream_set_blocking($stream, false);
$fiber = getFiberFromStream($stream, $url);
$content = $fiber->start($stream);
// save fiber context so we can process them later
$fibers[$key] = [$fiber, $content, $stream];
}
// now poll
$have_unterminated_fibers = true;
while ($have_unterminated_fibers) {
// first suppose we have no work to do
$have_unterminated_fibers = false;
// now loop over fibers to see if any is still working
foreach ($fibers as $key => $item) {
// fetch context
$fiber = $item[0];
$content = $item[1];
$stream = $item[2];
// don't do while till the end here,
// just process next chunk
if (!$fiber->isTerminated()) {
// yep, mark we still have some work left
$have_unterminated_fibers = true;
// update content in the context
$content .= $fiber->resume();
$fibers[$key][1] = $content;
} else {
if ($stream) {
fclose($stream);
// save result for return
$contents[$urls[$key]] = $content;
// mark stream as closed in context
// so it don't close twice
$fibers[$key][2] = null;
}
}
}
}
return $contents;
}

Build csv and upload to google drive in php (laravel 5.8)

How do I send a response()->stream() to google drive ? I'm not getting it because this method returns a class and not a file. My question is if I will need to save locally using file_put_contents() so that I can then send it to google drive
public function buildCsv($columns, $content): \Closure
{
return function () use ($columns, $content){
$file = fopen('php://output', 'w');
fputcsv($file, $columns);
foreach ($content as $item) {
fputcsv($file, $item);
}
fclose($file);
};
}
$cb = $this->buildCsv($this->CSVColumns, $csvData);
\Storage::disk('google')->put("csv-test", response()->stream($cb, 200, $headers));
On google drive my file looks like this:
Try this
public function buildCsvFile($columns, $content): string
{
$file = tmpfile();
fputcsv($file, $columns);
foreach ($content as $item) {
fputcsv($file, $item);
}
$metaDatas = stream_get_meta_data($file);
return file_get_contents($metaDatas['uri']);
}
$cb = $this->buildCsv($this->CSVColumns, $csvData);
\Storage::disk('google')->put("csv-test.csv", $cb);

Facebook PHP SDK "Too many requests in batch message. Maximum batch size is 50"

I'm trying to get the profiles of a large group of friends and I'm getting the error:
Too many requests in batch message. Maximum batch size is 50
From the API. Now I understand the error message but I thought I built the function to mitigate this error. I specifically make the calls in chunks of 50. I don't change $chunk_size in any of the methods that call it so I don't really know what is going on here.
This is the function that is spitting out the error:
protected function getFacebookProfiles($ids, array $fields = array('name', 'picture'), $chunk_size = 50)
{
$facebook = App::make('Facebook');
$fields = implode(',', $fields);
$requests = array();
foreach ($ids as $id) {
$requests[] = array('method' => 'GET', 'relative_url' => "{$id}?fields={$fields}");
}
$responses = array();
$chunks = array_chunk($requests, $chunk_size);
foreach ($chunks as $chunk) {
$batch = json_encode($requests);
$response = $facebook->api("?batch={$batch}", 'POST');
foreach ($response as &$profile) {
$profile = json_decode($profile['body']);
if (empty($profile->picture->data)) {
// something has gone REALLY wrong, this should never happen but if it does we'll have more debug information
if (empty($profile->error->message)) {
throw new Exception('Unexpected error when retrieving user information for IDs:' . implode(', ', $ids));
}
$profile->error = (array) $profile->error;
throw new FacebookApiException((array) $profile);
}
$profile->picture = $profile->picture->data;
}
$responses = array_merge($responses, $response);
}
return $responses;
}
You are not using your $chunk variable in the JSON you generate for your API call, but still the original, unmodified $requests.
Happy slamming :-)

Fatal error: Out of memory PHP

I am not sure why this was working fine last night and this morning I am getting
Fatal error: Out of memory (allocated 1611137024) (tried to allocate
1610350592 bytes) in /home/twitcast/public_html/system/index.php on
line 121
The section of code being ran is as follows
function podcast()
{
$fetch = new server();
$fetch->connect("TCaster");
$collection = $fetch->db->shows;
// find everything in the collection
$cursor = $collection->find();
if($cursor->count() > 0)
{
$test = array();
// iterate through the results
while( $cursor->hasNext() ) {
$test[] = ($cursor->getNext());
}
$i = 0;
foreach($test as $d) {
for ( $i = 0; $i <= 3; $i ++) {
$url = $d["streams"][$i];
$xml = file_get_contents( $url );
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadXML( $xml); // $xml = file_get_contents( "http://www.c3carlingford.org.au/podcast/C3CiTunesFeed.xml")
// Initialize XPath
$xpath = new DOMXpath( $doc);
// Register the itunes namespace
$xpath->registerNamespace( 'itunes', 'http://www.itunes.com/dtds/podcast-1.0.dtd');
$items = $doc->getElementsByTagName('item');
foreach( $items as $item) {
$title = $xpath->query( 'title', $item)->item(0)->nodeValue;
$published = strtotime($xpath->query( 'pubDate', $item)->item(0)->nodeValue);
$author = $xpath->query( 'itunes:author', $item)->item(0)->nodeValue;
$summary = $xpath->query( 'itunes:summary', $item)->item(0)->nodeValue;
$enclosure = $xpath->query( 'enclosure', $item)->item(0);
$url = $enclosure->attributes->getNamedItem('url')->value;
$fname = basename($url);
$collection = $fetch->db->shows_episodes;
$cursorfind = $collection->find(array("internal_url"=>"http://twitcatcher.russellharrower.com/videos/$fname"));
if($cursorfind->count() < 1)
{
$copydir = "/home/twt/public_html/videos/";
$data = file_get_contents($url);
$file = fopen($copydir . $fname, "w+");
fputs($file, $data);
fclose($file);
$collection->insert(array("show_id"=> new MongoId($d["_id"]),"stream"=>$i,"episode_title"=>$title, "episode_summary"=>$summary,"published"=>$published,"internal_url"=>"http://twitcatcher.russellharrower.com/videos/$fname"));
echo "$title <br> $published <br> $summary <br> $url<br><br>\n\n";
}
}
}
}
}
line 121 is
$data = file_get_contents($url);
You want to add 1.6GB of memory usage for a single PHP thread? While you can increase the memory limit, my strong advice is to look at another way of doing what you want.
Probably the easiest solution: you can use CURL to request a byte range of the source file (using Curl is wiser than get_file_contents anyway, for remote files). You can get 100K ata time, write to the local file then got the next 100k and appeand to the file etc, until the entire file is pulled in.
You may also do something with streams, but it gets a little more complex. This may be your only option if the remote server won't let you get part of a file by bytes.
Finally there's Linux commands such as wget, run through exec(), if your server has permissions.
Memory Limit - take a look at this directive. Suppose that is what you need.
or you may try to use copy instead of reading file to memory (which is video file, as I understand so there is nothing strange that it takes a lot of memory):
$copydir = "/home/twt/public_html/videos/";
copy($url, $copydir . $fname);
Looks like last night opened file were smaller)

PHP multipart form data PUT request?

I'm writing a RESTful API. I'm having trouble with uploading images using the different verbs.
Consider:
I have an object which can be created/modified/deleted/viewed via a post/put/delete/get request to a URL. The request is multi part form when there is a file to upload, or application/xml when there's just text to process.
To handle the image uploads which are associated with the object I am doing something like:
if(isset($_FILES['userfile'])) {
$data = $this->image_model->upload_image();
if($data['error']){
$this->response(array('error' => $error['error']));
}
$xml_data = (array)simplexml_load_string( urldecode($_POST['xml']) );
$object = (array)$xml_data['object'];
} else {
$object = $this->body('object');
}
The major problem here is when trying to handle a put request, obviously $_POST doesn't contain the put data (as far as I can tell!).
For reference this is how I'm building the requests:
curl -F userfile=#./image.png -F xml="<xml><object>stuff to edit</object></xml>"
http://example.com/object -X PUT
Does anyone have any ideas how I can access the xml variable in my PUT request?
First of all, $_FILES is not populated when handling PUT requests. It is only populated by PHP when handling POST requests.
You need to parse it manually. That goes for "regular" fields as well:
// Fetch content and determine boundary
$raw_data = file_get_contents('php://input');
$boundary = substr($raw_data, 0, strpos($raw_data, "\r\n"));
// Fetch each part
$parts = array_slice(explode($boundary, $raw_data), 1);
$data = array();
foreach ($parts as $part) {
// If this is the last part, break
if ($part == "--\r\n") break;
// Separate content from headers
$part = ltrim($part, "\r\n");
list($raw_headers, $body) = explode("\r\n\r\n", $part, 2);
// Parse the headers list
$raw_headers = explode("\r\n", $raw_headers);
$headers = array();
foreach ($raw_headers as $header) {
list($name, $value) = explode(':', $header);
$headers[strtolower($name)] = ltrim($value, ' ');
}
// Parse the Content-Disposition to get the field name, etc.
if (isset($headers['content-disposition'])) {
$filename = null;
preg_match(
'/^(.+); *name="([^"]+)"(; *filename="([^"]+)")?/',
$headers['content-disposition'],
$matches
);
list(, $type, $name) = $matches;
isset($matches[4]) and $filename = $matches[4];
// handle your fields here
switch ($name) {
// this is a file upload
case 'userfile':
file_put_contents($filename, $body);
break;
// default for all other files is to populate $data
default:
$data[$name] = substr($body, 0, strlen($body) - 2);
break;
}
}
}
At each iteration, the $data array will be populated with your parameters, and the $headers array will be populated with the headers for each part (e.g.: Content-Type, etc.), and $filename will contain the original filename, if supplied in the request and is applicable to the field.
Take note the above will only work for multipart content types. Make sure to check the request Content-Type header before using the above to parse the body.
Please don't delete this again, it's helpful to a majority of people coming here! All previous answers were partial answers that don't cover the solution as a majority of people asking this question would want.
This takes what has been said above and additionally handles multiple file uploads and places them in $_FILES as someone would expect. To get this to work, you have to add 'Script PUT /put.php' to your Virtual Host for the project per Documentation. I also suspect I'll have to setup a cron to cleanup any '.tmp' files.
private function _parsePut( )
{
global $_PUT;
/* PUT data comes in on the stdin stream */
$putdata = fopen("php://input", "r");
/* Open a file for writing */
// $fp = fopen("myputfile.ext", "w");
$raw_data = '';
/* Read the data 1 KB at a time
and write to the file */
while ($chunk = fread($putdata, 1024))
$raw_data .= $chunk;
/* Close the streams */
fclose($putdata);
// Fetch content and determine boundary
$boundary = substr($raw_data, 0, strpos($raw_data, "\r\n"));
if(empty($boundary)){
parse_str($raw_data,$data);
$GLOBALS[ '_PUT' ] = $data;
return;
}
// Fetch each part
$parts = array_slice(explode($boundary, $raw_data), 1);
$data = array();
foreach ($parts as $part) {
// If this is the last part, break
if ($part == "--\r\n") break;
// Separate content from headers
$part = ltrim($part, "\r\n");
list($raw_headers, $body) = explode("\r\n\r\n", $part, 2);
// Parse the headers list
$raw_headers = explode("\r\n", $raw_headers);
$headers = array();
foreach ($raw_headers as $header) {
list($name, $value) = explode(':', $header);
$headers[strtolower($name)] = ltrim($value, ' ');
}
// Parse the Content-Disposition to get the field name, etc.
if (isset($headers['content-disposition'])) {
$filename = null;
$tmp_name = null;
preg_match(
'/^(.+); *name="([^"]+)"(; *filename="([^"]+)")?/',
$headers['content-disposition'],
$matches
);
list(, $type, $name) = $matches;
//Parse File
if( isset($matches[4]) )
{
//if labeled the same as previous, skip
if( isset( $_FILES[ $matches[ 2 ] ] ) )
{
continue;
}
//get filename
$filename = $matches[4];
//get tmp name
$filename_parts = pathinfo( $filename );
$tmp_name = tempnam( ini_get('upload_tmp_dir'), $filename_parts['filename']);
//populate $_FILES with information, size may be off in multibyte situation
$_FILES[ $matches[ 2 ] ] = array(
'error'=>0,
'name'=>$filename,
'tmp_name'=>$tmp_name,
'size'=>strlen( $body ),
'type'=>$value
);
//place in temporary directory
file_put_contents($tmp_name, $body);
}
//Parse Field
else
{
$data[$name] = substr($body, 0, strlen($body) - 2);
}
}
}
$GLOBALS[ '_PUT' ] = $data;
return;
}
For whom using Apiato (Laravel) framework:
create new Middleware like file below, then declair this file in your laravel kernel file within the protected $middlewareGroups variable (inside web or api, whatever you want) like this:
protected $middlewareGroups = [
'web' => [],
'api' => [HandlePutFormData::class],
];
<?php
namespace App\Ship\Middlewares\Http;
use Closure;
use Symfony\Component\HttpFoundation\ParameterBag;
/**
* #author Quang Pham
*/
class HandlePutFormData
{
/**
* Handle an incoming request.
*
* #param \Illuminate\Http\Request $request
* #param \Closure $next
*
* #return mixed
*/
public function handle($request, Closure $next)
{
if ($request->method() == 'POST' or $request->method() == 'GET') {
return $next($request);
}
if (preg_match('/multipart\/form-data/', $request->headers->get('Content-Type')) or
preg_match('/multipart\/form-data/', $request->headers->get('content-type'))) {
$parameters = $this->decode();
$request->merge($parameters['inputs']);
$request->files->add($parameters['files']);
}
return $next($request);
}
public function decode()
{
$files = [];
$data = [];
// Fetch content and determine boundary
$rawData = file_get_contents('php://input');
$boundary = substr($rawData, 0, strpos($rawData, "\r\n"));
// Fetch and process each part
$parts = $rawData ? array_slice(explode($boundary, $rawData), 1) : [];
foreach ($parts as $part) {
// If this is the last part, break
if ($part == "--\r\n") {
break;
}
// Separate content from headers
$part = ltrim($part, "\r\n");
list($rawHeaders, $content) = explode("\r\n\r\n", $part, 2);
$content = substr($content, 0, strlen($content) - 2);
// Parse the headers list
$rawHeaders = explode("\r\n", $rawHeaders);
$headers = array();
foreach ($rawHeaders as $header) {
list($name, $value) = explode(':', $header);
$headers[strtolower($name)] = ltrim($value, ' ');
}
// Parse the Content-Disposition to get the field name, etc.
if (isset($headers['content-disposition'])) {
$filename = null;
preg_match(
'/^form-data; *name="([^"]+)"(; *filename="([^"]+)")?/',
$headers['content-disposition'],
$matches
);
$fieldName = $matches[1];
$fileName = (isset($matches[3]) ? $matches[3] : null);
// If we have a file, save it. Otherwise, save the data.
if ($fileName !== null) {
$localFileName = tempnam(sys_get_temp_dir(), 'sfy');
file_put_contents($localFileName, $content);
$files = $this->transformData($files, $fieldName, [
'name' => $fileName,
'type' => $headers['content-type'],
'tmp_name' => $localFileName,
'error' => 0,
'size' => filesize($localFileName)
]);
// register a shutdown function to cleanup the temporary file
register_shutdown_function(function () use ($localFileName) {
unlink($localFileName);
});
} else {
$data = $this->transformData($data, $fieldName, $content);
}
}
}
$fields = new ParameterBag($data);
return ["inputs" => $fields->all(), "files" => $files];
}
private function transformData($data, $name, $value)
{
$isArray = strpos($name, '[]');
if ($isArray && (($isArray + 2) == strlen($name))) {
$name = str_replace('[]', '', $name);
$data[$name][]= $value;
} else {
$data[$name] = $value;
}
return $data;
}
}
Pls note: Those codes above not all mine, some from above comment, some modified by me.
Quoting netcoder reply : "Take note the above will only work for multipart content types"
To work with any content type I have added the following lines to Mr. netcoder's solution :
// Fetch content and determine boundary
$raw_data = file_get_contents('php://input');
$boundary = substr($raw_data, 0, strpos($raw_data, "\r\n"));
/*...... My edit --------- */
if(empty($boundary)){
parse_str($raw_data,$data);
return $data;
}
/* ........... My edit ends ......... */
// Fetch each part
$parts = array_slice(explode($boundary, $raw_data), 1);
$data = array();
............
...............
I've been trying to figure out how to work with this issue without having to break RESTful convention and boy howdie, what a rabbit hole, let me tell you.
I'm adding this anywhere I can find in the hope that it will help somebody out in the future.
I've just lost a day of development firstly figuring out that this was an issue, then figuring out where the issue lay.
As mentioned, this isn't a symfony (or laravel, or any other framework) issue, it's a limitation of PHP.
After trawling through a good few RFCs for php core, the core development team seem somewhat resistant to implementing anything to do with modernising the handling of HTTP requests. The issue was first reported in 2011, it doesn't look any closer to having a native solution.
That said, I managed to find this PECL extension called Always Populate Form Data. I'm not really very familiar with pecl, and couldn't seem to get it working using pear. but I'm using CentOS and Remi PHP which has a yum package.
I ran yum install php-pecl-apfd and it literally fixed the issue straight away (well I had to restart my docker containers but that was a given).
I believe there are other packages in various flavours of linux and I'm sure anybody with more knowledge of pear/pecl/general php extensions could get it running on windows or mac with no issue.
I know this article is old.
But unfortunately, PHP still does not pay attention to form-data other than the Post method.
Thanks to friends (#netcoder, #greendot, #pham-quang) who suggested solutions above.
Using those solutions I wrote a library for this purpose:
composer require alireaza/php-form-data
You can also use composer require alireaza/laravel-form-data in Laravel.

Categories