Write very large array to file in PHP

Write very large array to file in PHP - php

I've got a client with a Magento shop. They are creating a txt file to upload to googlebase, which contains all of their products, but due to the quantity of products (20k), the script bombs out once it's taken up about 1gb. It's being run via cron.
Is there a way to either zip or segment the array, or write it to the file as it's created, rather than create the array and then write it?
<?php
define('SAVE_FEED_LOCATION','/home/public_html/export/googlebase/google_base_feed_cron.txt');
set_time_limit(0);
require_once '/home/public_html/app/Mage.php';
Mage::app('default');
try{
$handle = fopen(SAVE_FEED_LOCATION, 'w');
$heading = array('id','title','description','link','image_link','price','product_type','condition','c:product_code');
$feed_line=implode("\t", $heading)."\r\n";
fwrite($handle, $feed_line);
$products = Mage::getModel('catalog/product')->getCollection();
$products->addAttributeToFilter('status', 1);//enabled
$products->addAttributeToFilter('visibility', 4);//catalog, search
$products->addAttributeToFilter('type_id', 'simple');//simple only (until fix is made)
$products->addAttributeToSelect('*');
$prodIds=$products->getAllIds();
foreach($prodIds as $productId) {
$product = Mage::getModel('catalog/product');
$product->load($productId);
$product_data = array();
$product_data['sku']=$product->getSku();
$product_data['title']=$product->getName();
$product_data['description']=$product->getShortDescription();
$product_data['link']=$product->getProductUrl(). '?source=googleps';
$product_data['image_link']=Mage::getBaseUrl(Mage_Core_Model_Store::URL_TYPE_MEDIA).'catalog/product'.$product->getImage();
// Get price of item
if($product->getSpecialPrice())
$product_data['price']=$product->getSpecialPrice();
else
$product_data['price']=$product->getPrice();
$product_data['product_type']='';
$product_data['condition']='new';
$product_data['c:product_code']=$product_data['sku'];
foreach($product->getCategoryIds() as $_categoryId){
$category = Mage::getModel('catalog/category')->load($_categoryId);
$product_data['product_type'].=$category->getName().', ';
}
$product_data['product_type']=rtrim($product_data['product_type'],', ');
//sanitize data
foreach($product_data as $k=>$val){
$bad=array('"',"\r\n","\n","\r","\t");
$good=array(""," "," "," ","");
$product_data[$k] = '"'.str_replace($bad,$good,$val).'"';
}
$feed_line = implode("\t", $product_data)."\r\n";
fwrite($handle, $feed_line);
fflush($handle);
}
//---------------------- WRITE THE FEED
fclose($handle);
}
catch(Exception $e){
die($e->getMessage());
}
?>

I have two fast answers here:
1) Try to increase php's allowed maximum memory size (for the command line since it is a cron script)
2) The way the senior developers solve similar issues, where I currently work is something like the following:
Create a date field attribute with a name like googlebase_uploaded, and run the cron script with something like const MAX_PRODUCTS_TO_WRITE.
Then append to the file and flag each product that got appended.
What I am trying to say is slice the execution time into slower chunks that won't break the script.
Unfortunately that's where I'm missing java and c#

Related

php - for loop repeating itself / going out of sequence

I'm very new to PHP, making errors and learning as I go. Please be gentle! :)
I want to access some data from Blizzard.com's API. For this particular data set, it's not a block of data in JSON, rather each object has it's own URL to access. I estimate that there are approx 150000 objects, however I don't know the start or end points of the number range. So I'm having to assume 1 and work past the highest number I know (269065)
To get the data, I need to access each object's data via a JSON file, which I read, get the contents of & drop in to a text file (this could be written as an insert in to a SQL db too, as I'm able to do this if it's the text file that's the issue). But to be honest, I would love to get to the bottom of why this is happening as much as anything!
I wasn't going to try and run ~250000 iterations in a for loop, I thought I'd try something I considered small, 2000.
The for loop starts with $a as 1, uses $a as part of the URL, loads & decodes the JSON, checks to see if the first field (ID) in the object is set, if it is, it writes a few fields to data.txt & if the first field (ID) isn't set it just writes $a to data.txt (so I know it's a null for other purposes not outlined here).
Simple! Or so I thought, after approx after 183 iterations, the data written to the text file goes awry as seen by the quote below. It is out of sequence and starts at 1 again, then back to 184 ad nauseam. The loop then seems to be locked in some kind of infinite loop of running, outputting in a random order until I close the page 10-20 minutes later.
I have obviously made a big mistake! But I have no idea what I have done wrong to have caused this. During my attempts I have rewritten the code with new variable names, so a new text does not conflict with code that could be running in memory.
I've tried resetting variables to blank at the end of the loop in case it something was being reused that was causing a problem.
If anyone could point out any errors in my code, or suggest something for me to look in to, to handle bigger loops that would be brilliant. I am assuming my issue may be a time out or memory problem. But I don't know where to start & was hoping I'd find some suggestions here.
If it's relevant, I am using 000webhostapp.com as my host provider for now, until I get some paid for hosting.
1 ... 182 183 1 184 2 3 185 4 186 5 187 6 188 7 189 190 8 191
for ($a = 1; $a <= 2000; $a++) {
$json = "https://eu.api.battle.net/wow/recipe/".$a."?locale=en_GB&<MYPRIVATEAPIKEY>";
$contents = file_get_contents($json);
$data = json_decode($contents,true);
if (isset($data['id'])) {
$file = fopen("data.txt","a");
fwrite($file,$data['id'].",'".$data['name']."'\n");
fclose($file);
} else {
$file = fopen("data.txt","a");
fwrite($file,$a."\n");
fclose($file);
}
}
The content of the file I'm trying to access is
{"id":33994,"name":"Precise Strikes","profession":"Enchanting","icon":"spell_holy_greaterheal"}
I scrapped the original plan and wrote this instead. Thank you again who took the time out of their day to help and offer suggestions!
$b = $mysqli->query("SELECT id FROM `static_recipes` order by id desc LIMIT 1;")->fetch_object()->id;
if (empty($b)) {$b=1;};
$count = $b+101;
$write = [];
for ($a = $b+1; $a < $count; $a++) {
$json = "https://eu.api.battle.net/wow/recipe/".$a."?locale=en_GB&apikey=";
$contents = #file_get_contents($json);
$data = json_decode($contents,true);
if (isset($data['id'])) {
$write [] = "(".$data['id'].",'".addslashes($data['name'])."','".addslashes($data['profession'])."','".addslashes($data['icon'])."')";
} else {
$write [] = "(".$a.",'a','a','a'".")";
}
}
$SQL = ('INSERT INTO `static_recipes` (id, name, profession, icon) VALUES '.implode(',', $write));
$mysqli->query($SQL);
$mysqli->close();

$write = [];
for ($a = 1; $a <= 2000; $a++) {
$json = "https://eu.api.battle.net/wow/".$a."?locale=en_GB&<MYPRIVATEAPIKEY>";
$contents = file_get_contents($json);
$data = json_decode($contents,true);
if (isset($data['id'])) {
$write [] = $data['id'].",'".$data['name']."'\n";
} else {
$write [] = $a."\n";
}
}
$file = fopen("data.txt","a");
fwrite($file, implode('', $write));
fclose($file);
Also, why you are think what some IDS isn't duplicated at several "https://eu.api.battle.net/wow/[N]" urls data?
Also if you are I wasn't going to try and run ~250000 think about curl_multi_init(): http://php.net/manual/en/function.curl-multi-init.php

I can't really see anything obviously wrong with your code, can't run it though as I don't have the JSON
It could be possible that there is some kind of race condition since you're opening and closing the same file hundreds of times very quickly.
File operations might seem atomic but not necessarily so - here's an interesting SO thread:
Does PHP wait for filesystem operations (like file_put_contents) to complete before moving on?
Like some others' suggested - maybe just open the file before you enter the loop then close the file when the loop breaks.
I'd try it first and see if it helps.

There's nothing in your original code that would cause that sort of behaviour. PHP will not arbitrarily change the value of a variable. You are opening this file in append mode, are you certain that you're not looking at old data? Maybe output some debug messages as you process the data. It's likely you'd run up against some rate limiting on the API server, so putting a pause in there somewhere may improve reliability.
The only substantive change I'd suggest to your code is opening the file once and closing it when you're done.
$file = fopen("data_1_2000.txt", "w");
for ($a = 1; $a <= 2000; $a++) {
$json = "https://eu.api.battle.net/wow/recipe/$a?locale=en_GB&<MYPRIVATEAPIKEY>";
$contents = file_get_contents($json);
$data = json_decode($contents, true);
if (!empty($data['id'])) {
$data["name"] = str_replace("'", "\\'", $data["name"]);
$record = "$data[id],'$data[name]'";
} else {
$record = $a;
}
fwrite($file, "$record\n");
sleep(1);
echo "$a "; if ($a % 50 === 0) echo "\n";
}
fclose($file);

Wordpress query with 19000+ products

I have problem with my Wordpress Query.
What I'm try to do:
I have CSV file with products data(name, price, stock, sku etc.)
And I want to import this file, but when I'm trying to get Product ID by SKU my query is too high for my server, but I'm doing some stupid idea : in foreach I'm trying to get all product_id.
It's possible to split my wp query without killing my server?
I'm trying sleep but this is no result...
My code is here:
public function new_import_stock_prices(){
global $wpdb;
global $post;
if ( !function_exists( 'wc_get_product_id_by_sku' ) ) {
require_once '/includes/wc-product-functions.php';
}
echo '<h1>Import stanów magazynowych i cen z pliku CSV </h1>';
echo '<h4>Plik pobierany jest z netis/products.csv</h4>';
$fn = 'https://e-xxxxx.pl/xxx/products.csv';
$file_array = file($fn);
echo '<table>';
echo '<tr>';
echo '<td>LP</td>';
echo '<td>Nazwa</td>';
echo '<td>SKU</td>';
echo '<td>Stan magazynowy</td>';
echo '<td>Cena</td>';
echo '<td>Product ID</td>';
$i = 1;
if ( in_array( 'woocommerce/woocommerce.php', apply_filters( 'active_plugins', get_option( 'active_plugins' ) ) ) ) {
foreach ($file_array as $line_number =>&$line)
{
if ($line_number > 0 && $line_number % 10 == 0) {
$row2=explode('|',$line);
$sku = $row2[1];
// get the product ID from the SKU
$product_id = $wpdb->get_var( $wpdb->prepare( "SELECT post_id FROM $wpdb->postmeta WHERE meta_key='_sku' AND meta_value='%s' LIMIT 1", $sku ) );
// Get an instance of the WC_Product object
$product = new WC_Product( $product_id );
//Get product stock quantity and stock status
$stock_quantity = $product->get_stock_quantity();
$stock_status = $product->get_stock_status();
echo '<tr>';
echo '<td>'.$i.'</td>';
echo '<td>'.$row2[0].'</td>';
echo '<td>'.$row2[1].'</td>';
echo '<td>'.$row2[5].'</td>';
echo '<td>'.$row2[2].'</td>';
echo '<td>'.$product_id.'</td>';
echo '</tr>';
$i = $i +1;
sleep(10);
}
}
}
echo '</table>';
}
BTW. my wp_postmeta table has ~900 000+ records :O

And I want to import this file
I don't see any code for importing, I see code for displaying. Assuming by import, you mean display:
What's probably happening is one of a few things.
your running out of memory (you should get an error for this)
don't use file($fn) use file functions that open the file and read line by line, such as fgetcsv
your running out of time
not much you can do about this, except send less data
your overwhelming the browser buffer by sending to much output.
again not much you can do about this but send less data.
The only real solution (Assuming by import, you mean display) is to page the data.
Now even in a file you can page the data, but I would suggest using SQLFileObject instead of the procedural file functions. That said you can page using the procedural style but its by Byte Offset, not page number.
While I can't code an entire paging system I can give you some tips:
For example
//hard to tell how many lines in the file
$fn = 'https://e-xxxxx.pl/xxx/products.csv';
$f = fopen($fn, 'r');
fseek($f, $_GET['offset']); //seek to a byte offset
$i=0;
while(!feof($f) && ($row=fgetcsv($f)) && null !== $row[0]){
if($i==10)
$offset = ftell($f); //get byte offset
++$i;
}
ftell and fseek allow you get or move the file pointer (in bytes). So you can start reading from a predefined offset that you can pass around in the url ... etc.
You can do the same thing with SplFileObject, but a bit better.
try {
$fn = 'https://e-xxxxx.pl/xxx/products.csv';
$csv = new SplFileObject($fn, 'r');
} catch (RuntimeException $e ) {
printf("Error openning csv: %s\n", $e->getMessage());
}
$csv->seek($_GET['line']); //seek to a predefined line
while(!$csv->eof() && ($row = $csv->fgetcsv()) && null !== $row[0]) {
if(($csv->key()-$_GET['line'])==10)
$line = $csv->key(); //get line offset
++$i;
}
The main advantage of SPL is you can use the row number, which is much easier to work with.
You can also get the total number of lines in a file like this
$csv->seek(PHP_INT_MAX);
$total = $csv->key();
$csv->rewind(); //or $csv->seek($_GET['line'])
Basically this seeks to the largest possible INT PHP can handle, but because the file is a fixed length, it puts the pointer at the end of the file, then using key we can get the line number. Then we simply rewind to where we want to read from.
I mention the total number of rows because in paging it's nice to be able to show that.
Another option (to display)
Besides paging is to output the page without buffering.
// Turn off output buffering
ini_set('output_buffering', 'off');
// Turn off PHP output compression
ini_set('zlib.output_compression', false);
//Flush (send) the output buffer and turn off output buffering
//ob_end_flush();
while (ob_get_level()) ob_end_flush();
// Implicitly flush the buffer(s)
ini_set('implicit_flush', true);
ob_implicit_flush(true);
Combine this with one of the methods I showed above to read the file 1 line at a time, and you may be able to eventually read all that data out.
Saving
For saving the data, your probably going to need to break it into batches, the same thing with paging can be done here (using offset or line). So that you only import a couple thousand rows at a time. I would also recommend not outputting the data, because you can give the browser more buffer then it can handle and lock it up. However if you page the data you can break it into small enough chunks that the browser can handle it.
You can even automate this using successive AJAX calls. Basically you would call the code on the backend to save a certain number of rows (x). The sever would respond, and then you would make another call for (x) more rows, save & repeat.
I want to display all products id, to check if it's correct. Next step is change stock, price and saving products
It would be easier to do this work in something like excel, just from a data entry standpoint, no one wants to edit thousands of rows on a web page and then have their session time out or something like that.
Hope that helps.

Getting all from json api array using php

I wanna improve on how to fetch data from an API. In this case I want to fetch every app-id from the Steam API, and list them one per line in a .txt file. Do I need an infinite (or a very high-number) loop (with ++ after every iteration) to fetch everyone? I mean, counting up from id 0 with for example a foreach-loop? I'm thinking it will take ages and sounds very much like bad practice.
How do I get every appid {"appid:" n} from the response of http://api.steampowered.com/ISteamApps/GetAppList/v0001?
<?php
//API-URL
$url = "http://api.steampowered.com/ISteamApps/GetAppList/v0001";
//Fetch content and decode
$game_json = json_decode(curl_get_contents($url), true);
//Define file
$file = 'steam.txt';
//This is where I'm lost. One massive array {"app": []} with lots of {"appid": n}.
//I know how to get one specific targeted line, but how do I get them all?
$line = $game_json['applist']['apps']['app']['appid'][every single line, one at a time]
//Write to file, one id per line.
//Like:
//5
//7
//8
//and so on
file_put_contents($file, $line, FILE_APPEND);
?>
Any pointing just in the right direction will be MUCH appreciated. Thanks!

You don't need to worry about counters with foreach loops, they are designed to go through and work with each item in the object.
$file = "steam.txt";
$game_list = "";
$url = "http://api.steampowered.com/ISteamApps/GetAppList/v0001";
$game_json = file_get_contents($url);
$games = json_decode($game_json);
foreach($games->applist->apps->app as $game) {
// now $game is a single entry, e.g. {"appid":5,"name":"Dedicated server"}
$game_list .= "$game->appid\n";
}
file_put_contents($file, $game_list);
Now you have a text file with 28000 numbers in it. Congratulations?

Handle Large File with PHP

I have a file with the size of around 10 GB or more. The file contains only numbers ranging from 1 to 10 on each line and nothing else. Now the task is to read the data[numbers] from the file and then sort the numbers in ascending or descending order and create a new file with the sorted numbers.
Can anyone of you please help me with the answer?

I'm assuming this is somekind of homework and goal for this is to sort more data than you can hold in your RAM?
Since you only have numbers 1-10, this is not that complicated task. Just open your input file and count how many occourances of every specific number you have. After that you can construct simple loop and write values into another file. Following example is pretty self explainatory.
$inFile = '/path/to/input/file';
$outFile = '/path/to/output/file';
$input = fopen($inFile, 'r');
if ($input === false) {
throw new Exception('Unable to open: ' . $inFile);
}
//$map will be array with size of 10, filled with 0-s
$map = array_fill(1, 10, 0);
//Read file line by line and count how many of each specific number you have
while (!feof($input)) {
$int = (int) fgets($input);
$map[$int]++;
}
fclose($input);
$output = fopen($outFile, 'w');
if ($output === false) {
throw new Exception('Unable to open: ' . $outFile);
}
/*
* Reverse array if you need to change direction between
* ascending and descending order
*/
//$map = array_reverse($map);
//Write values into your output file
foreach ($map AS $number => $count) {
$string = ((string) $number) . PHP_EOL;
for ($i = 0; $i < $count; $i++) {
fwrite($output, $string);
}
}
fclose($output);
Taking into account the fact, that you are dealing with huge files, you should also check script execution time limit for your PHP environment, following example will take VERY long for 10GB+ sized files, but since I didn't see any limitations concerning execution time and performance in your question, I'm assuming it is OK.

I had a similar issue before. Trying to manipulate such a large file ended up being huge drain on resources and it couldn't cope. The easiest solution I ended up with was to try and import it into a MySQL database using a fast data dump function called LOAD DATA INFILE
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
Once it's in you should be able to manipulate the data.
Alternatively, you could just read the file line by line while outputting the result into another file line by line with the sorted numbers. Not too sure how well this would work though.
Have you had any previous attempts at it or are you just after a possible method of doing it?

If that's all you don't need PHP (if you have a Linux maschine at hand):
sort -n file > file_sorted-asc
sort -nr file > file_sorted-desc
Edit: OK, here's your solution in PHP (if you have a Linux maschine at hand):
<?php
// Sort ascending
`sort -n file > file_sorted-asc`;
// Sort descending
`sort -nr file > file_sorted-desc`;
?>
:)

Removing Lines in php ? Is this possible?

I have been struggling to create a Simple ( really simple ) chat system for my website as my knowledge on Javascripting/AJAX are Limited after gather resources and help from many kind people I was able to create my simple chat system but left with one problem.
The messages are posted to a file called "msg.html" in this format :
<p><span id="name">$name</span><span id="Msg">$message</span></p>
And then using PHP and AJAX I will retrieve the messages instantly from the file using the
file(); function and a foreach(){} loop withing PHP here is the code :
<?php
$file = 'msg.html';
$data = file($file);
$max_lines = 20;
if(count($data) > $max_lines){
// here i want the data to be deleted from oldest until i only have 20 messages left.
}
foreach($data as $line_num => $line){
echo $line_num . " . " . $line;
}
?>
My Question is how can i delete the oldest messages so that i am only left with the latest 20 Messages ?

How does something like this seem to you:
$file = 'msg.html';
$data = file($file);
$max_lines = 20;
foreach($data as $line_num => $line)
{
if ($line_num < $max_lines)
{
echo $line_num . " . " . $line;
}
else
{
unset($data[$line_num]);
}
}
file_put_contents('msg.html', $data);
?>
http://www.php.net/manual/en/function.file-put-contents.php for more info :)

I suppose you can read the file, explode it into an array, chop off everything but last 20 fields and write it back to file, overwriting the old one... Perhaps not the best solution but one that comes to mind if you really cant use database as Delan suggested

That's called round-robin if I recall correctly.
As far as I know, you can't remove arbitrary portions of a file. You need to overwrite the file with the new contents (or create a new file and remove the old one). You could also store messages in individual files but of course that implies up to $max_lines files to read.
You should also use flock() to avoid data corruption. Depending on the platform it's not 100% reliable but it's better than nothing.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Write very large array to file in PHP - php

Related

php - for loop repeating itself / going out of sequence

Wordpress query with 19000+ products

Getting all from json api array using php

Handle Large File with PHP

Removing Lines in php ? Is this possible?

Categories

Resources