I have a PHP script that builds a binary search tree over a rather large CSV file (5MB+). This is nice and all, but it takes about 3 seconds to read/parse/index the file.
Now I thought I could use serialize() and unserialize() to quicken the process. When the CSV file has not changed in the meantime, there is no point in parsing it again.
To my horror I find that calling serialize() on my index object takes 5 seconds and produces a huge (19MB) text file, whereas unserialize() takes unbearable 27 seconds to read it back. Improvements look a bit different. ;-)
So - is there a faster mechanism to store/restore large object graphs to/from disk in PHP?
(To clarify: I'm looking for something that takes significantly less than the aforementioned 3 seconds to do the de-serialization job.)
var_export should be lots faster as PHP won't have to process the string at all:
// export the process CSV to export.php
$php_array = read_parse_and_index_csv($csv); // takes 3 seconds
$export = var_export($php_array, true);
file_put_contents('export.php', '<?php $php_array = ' . $export . '; ?>');
Then include export.php when you need it:
include 'export.php';
Depending on your web server set up, you may have to chmod export.php to make it executable first.
Try igbinary...did wonders for me:
http://pecl.php.net/package/igbinary
First you have to change the way your program works. divide CSV file to smaller chunks. This is an IP datastore i assume. .
Convert all IP addresses to integer or long.
So if a query comes you can know which part to look.
There are <?php ip2long() /* and */ long2ip(); functions to do this.
So 0 to 2^32 convert all IP addresses into 5000K/50K total 100 smaller files.
This approach brings you quicker serialization.
Think smart, code tidy ;)
It seems that the answer to your question is no.
Even if you discover a "binary serialization format" option most likely even that would be to slow for what you envisage.
So, what you may have to look into using (as others have mentioned) is a database, memcached, or on online web service.
I'd like to add the following ideas as well:
caching of requests/responses
your PHP script does not shutdown but becomes a network server to answer queries
or, dare I say it, change the data structure and method of query you are currently using
i see two options here
string serialization, in the simplest form something like
write => implode("\x01", (array) $node);
read => explode() + $node->payload = $a[0]; $node->value = $a[1] etc
binary serialization with pack()
write => pack("fnna*", $node->value, $node->le, $node->ri, $node->payload);
read => $node = (object) unpack("fvalue/nre/nli/a*payload", $data);
It would be interesting to benchmark both options and compare the results.
If you want speed, writing to or reading from the file system in less than optimal.
In most cases, a database server will be able to store and retrieve data much more efficiently than a PHP script that is reading/writing files.
Another possibility would be something like Memcached.
Object serialization is not known for its performance but for its ease of use and it's definitely not suited to handle large amounts of data.
SQLite comes with PHP, you could use that as your database. Otherwise you could try using sessions, then you don't have to serialize anything, you just saving the raw PHP object.
What about using something like JSON for a format for storing/loading the data? I have no idea how fast the JSON parser is in PHP, but it's usually a fast operation in most languages and it's a lightweight format.
http://php.net/manual/en/book.json.php
Related
I am writing some json results in files in PHP on shared hosting (fwrite).
Then I read those files to extract json results (file_get_contents).
It happens some times (maybe one out of more than one thousand) that when I read this file it appears truncated: I can only read a multiple of the first 32768 bytes of the file.
I added some code to copy/paste the file I am reading in case the json string is not valid, and I then get 2 different files: the original one was correctly written as it contains a valid json string and the copied one contains only the beginning of the original one and has a size of x*32768 bytes.
Would you have any idea of what could be the problem and how to solve this? (I don't know how to investigate further)
Thank you
Without example code it is impossible to give a 'fix my code' answer, but when doing file write/read sort of programming, you should follow a simple process (which, from the description, is missing one fairly critical step!)
First, write to a TEMP file (you are writing to a file, but it is important here to write to a TEMP file - otherwise, you could have race conditions....... ;);
an easy way to do that in php
$yourData = "whateverYourDataIs....";
$goodfilename = 'whateverYourGoodFileNameIsForYourData.json';
$tempfilename = 'tempfile' . time(); // MANY ways to do this (lots of SO posts on it - just get a unique name every time you write ('unique' may not be needed if you only occasionally do a write, but it is a good safety measure to avoid collisions and time() works for many programs.)
// Now, use $tempfilename in your fwrite.
$fwrite = fwrite($tempfilename,$yourData);
if ($fwrite === false) {
// the write failed, so do whatever 'error' function you may need
// since it failed, there should be no file, but not a bad idea to attempt to delete it
unlink ($tempfile);
}
else {
// the write succeeded, so let's do a 'sanity check' on the file to make sure it is good JSON (this is a 'paranoid' check, but "better safe than sorry", right?)
if(json_decode($tempfile)){
// we know the file is good JSON, so now RENAME (this is really fast, so collisions are almost impossible) NOTE: see http://php.net/manual/en/function.rename.php comments for some potential challenges and workarounds if you have trouble with rename.
rename($tempfilename,$goodfilename);
}
// Now, the GOOD file will contain your new data - and those read issues are gone! (though never say 'never' - it may be possible, but very unlikely!)
}
This may/not be your issue directly and you will have to suit this to fit your code, but as a safety factor - and a good way to avoid collisions, it should give you ~100% read success, which I believe is what you are after!)
If this doesn't help, then some direct code will be needed to provide a more complete answer.
As suggested by #UlrichEckhardt comment, it was due to read / write concurrency problem. I was trying to read a file that was being writen. I solved this by just waiting before trying to read the file again
I read somewhere that php parses the whole .php file every time it is executed. Some solution was proposed in there (that was not opcache), but I lost the website and I couldn't find it.
Now I have an enormous php website that has many long functions that are often used alone, and it's required that the execution be fast.
To avoid having php parsing all the other functions that won't be used, I was thinking of making a modular design in which the functions, stored in independent php files, will only be included if they will be actually used. But I haven't been able to confirm that php will not parse an include inside of a function or inside of a conditional statement unless it is required. Does php parse those includes?
Example:
<?php
$func_to_execute = $_GET['func'];
$parameter = $_GET['parameter'];
switch($func_to_execute)
{
case 'a':
include 'func_a.php';
$output = func_a($parameter);
break;
case 'b':
include 'func_b.php';
$output = func_b($parameter);
break;
case 'c':
include 'func_c.php';
$output = func_c($parameter);
break;
};
echo $output;
?>
In this example, I would like php to parse only the func_a if I am requesting a, only the func_b if I am requesting b, etcetera. There are in practice more than just 3 functions, and each is a very long algorythm with also very long strings and arrays.
As an alternative to includes I was thinking of making independent php files and execute them and retrieve their output only if they are required, with a shell_exec. But that would take other complexities, like formatting the parameters (I don't have idea of how I would pass a very long string with special characters, or a JSON, as a parameter in the shell) and calling the function to execute in the shell. Would those complexities make it slower than just letting php parse the whole file?
I know about the opcache function. Would it be enough even if all the ops of all the functions will be tested each time?
Are there other ways to make a PHP website modular, and not having php parsing the whole of php files everytime?
Thank you.
since php uses many optimizations and caching apcu i.e. you dont need to care about this
include wont be parsed at load time.. its more like file_get_contents and execute in same context - and these will be optimized by internal php cache
http://php.net/manual/en/intro.apc.php
I made a benchmarking experiment and it seems that php truly does not parse conditional includes. I made the test using the example script mentioned, and defining each as:
func_a: it only declares that the value of the variable $x is the sentence 'war and peace'.
$x = 'war and peace';
func_b: it only declares that the value of the variable $x is the whole text of the novel war and peace, which is approximately 3.2 MB long (the whole text was pasted in the php file). This would be a very long file to parse.
$x = 'War and Peace, by Leo Tolstoy...(the whole novel...)...';
func_c: it contained incorrect syntax, that should immediately launch an error message from php. This was made to guarantee that php was not actually parsing what was not included.
I measured the execution time from another php script with the function shell_exec(). The results were (in seconds):
func_a ≈ 0.122
func_b ≈ 0.152
func_c ≈ 0.119
Therefore I conclude that:
- Includes in a switch statement are not parsed unless they are actually required.
- A syntax mistake in an include (inside a switch statement) will not launch any error if it is not actually required, because it is not parsed.
- Anyway, the difference on the time of the process is very little (about 0.03 extra second for an extra text of 3.3 MB; or crudely said, 0.01 extra second per 1 MB of text to parse). However this might be important to consider if there are many users requesting the website at the same time, and therefore it might be useful to divide in modules (includes) if the script is actually that big. Also the fact that a wrongly written include that was not required be not parsed helps to not launch errors when they aren't relevant.
It seems then for me a good manner to design a modular application in PHP where the modules be extremely big.
I need to get the information (loading CPU in this moment) from server and write this in the variable. sys_getloadavg() returns an array with three samples (last 1, 5 and 15 minutes), but I need loading CPU in this moment.
There is not a direct way of doing it. My suggestion of a work around is to use a function like exec or its equivalents. With a exec you can invoke a unix command like top to get the desirable stats. Then you have to parse the return value of exec and get the information you are searching for. There is also the file /proc/stat which you can read an get the information you want.
It was easy, I had to turn to the first array variable:
$cpu_loading_array = sys_getloadavg();
$cpu_loading = $cpu_loading_array[0];
I'd like to run include on a string rather than a file, but an unaware of how to achieve this.
//This is the desired functionality
include($filename);
//But I want to do something like this instead.
$file_contents = getFileFromCacheOrSomewhereElse($filename);
include($file_contents); // Doens't work...
eval($file_contents); // Also incorrect.
Please note: "eval" is not the same as include -- "include" echos out the contents of the file (and executes any PHP tags) while "eval" executes the string as PHP code.
An example use case is loading a template file from Memcache (as a string), then running include on that string, rather than running include and relying on PHP filecache.
If you can turn on the allow_url_fopen and allow_url_include php.ini settings, then an alternative is the data stream wrapper (manual).
include 'data:text/plain,' . urlencode($file_contents);
eval("?>" . $file_contents . "<?php ");
does it.
Storing PHP code in the memcache is not the best idea.
And evaling it thereafter is even worse.
Any opcode cache, APC or EAccelerator will cache your PHP files on the fly, with no strange efforts like this, and even parse it for the faster execution.
EDIT. Given the voting results after all these years, I assume that this question is attracting only noobs, who have the same strange whim. So I have to repeat: although it defeats your brilliant idea,
just leave your includes as is
They will be cached much better and executed much faster by the internal PHP's opcode cache.
I'm trying to write a php function that takes the $name and $time and write it to a txt file (no mySQL) and sort the file numerically.
For example:
10.2342 bob
11.3848 CandyBoy
11.3859 Minsi
12.2001 dj
just added Minsi under a faster time, for example.
If the $name already exists in the file, only rewrite it if the time is faster (smaller) than the previous one, and only write if the time fits within 300 entries to keep the file small.
My forte isn't file writing but I was guessing to go about using the file() to turn the whole file into an array, but to my avail, it didn't work quite like I wanted. Any help would be appreciated
If your data sets are small, you may consider using var_export()
function dump($filename, Array &$data){
return file_put_contents('<?php return ' . var_export($data, true) . ';');
}
// create a data set
$myData = array('alpha', 'beta', 'gamma');
// save a data set
dump('file.dat', $myData);
// load a data set
$myData = require('file.dat');
Perform your sorts using the PHP array_* functions, and dump when necessary. var_export() saves the data as PHP parsable text, which is why the dump() function prepends the string <?php return. Of course, this is really only a viable option when your data sets are going to be small enough that keeping their contents in memory is not unreasonable.
Try creating a multi dimensional array "$timeArray[key][time] = name" and then sort($timeArray)