Good day!
I have a PHP script that reads a very huge XML file. I used fgets to read line by line. In some point, we need to stop the said script to check some data integrity. My problem is how to resume that running state (I mean the line which the script stopped). We don't want to start the script all over again for it takes days to be completed.
Is there such way that I can accomplish this? Any suggestion would be greatly appreciated.
DomDocument ?
There's also SimpleXML library of PHP which needs to be installed before using it in your applications, but before you use any of it's functions, it loads all of the XML document into it's cache.
There's also XMLReader library, which is used to read XML files without loading all of the file to cache, and is the better method for using for situations like this.
Here you can find information about these two libraries :
http://us.php.net/manual/en/book.xmlreader.php
http://us.php.net/manual/en/book.simplexml.php
And examples of using them:
http://us.php.net/manual/en/simplexml.examples-basic.php
And here's much more detailed explanation :
How to use XMLReader in PHP?
Related
I have a script running in Ruby, and another in PHP, and, for my needs it's not possible to make it all run in the same scripting language.
I want to use PHP to create a variable in Ruby. Currently, I have this code:
PHP:
$config = fopen("config.txt","w+");
fwrite($config,$sArguments);
fclose($config);
Ruby:
while true do
file = File.new("config.txt", "r")
config = file.gets
file.close
end
The PHP will write the next configuration in a file and Ruby will read it and turn it into a variable. It works, but Ruby has to work on reading and reading the file, and it sometimes fail to read it correct, so this code is very badly optimized.
Is there a faster way to pass information from a PHP script to a Ruby script?
I'm no expert in PHP but I think "How to run Ruby code from Python (Python-Ruby bridge)" provides some ways of how to accomplish communication between the two languages. Although the page speaks of Ruby-Python communication, I'm pretty sure some of the suggestions can be implemented for Ruby and PHP.
One solution is to use XML-RPC to transfer simple data types from PHP to Ruby. Ruby has native support for XML-RPC and I think there are libraries for PHP to enable support for XML-RPC.
You can also use pipes. That is, you call your PHP script via Ruby's IO.popen. I've done something similar for Ruby-Python in "Feasibility of using pipe for ruby-python communication" but I have not really evaluated the performance.
It's difficult to say what the problem is exactly from the information given, but when you say Ruby sometimes fails to read I'm guessing it's your Ruby script reading the file when PHP is half-way through writing it.
An easy fix would be for PHP to write it to a temporary file and then move or rename the temporary file so it can be picked up by the Ruby script. Move/rename is atomic so there is no chance of a half written file being read.
A couple of improvements you could also make:
Use something like the listen gem to only read the file on an update.
I'm not sure what data you're passing, but you could use something like YAML as an interchange format. Catch any parsing exceptions and log the received data to aid debugging.
Taking a step back, if you're on a "real O/S", a simpler option would be to use message-based IPC via Unix sockets, assuming persistence is not a concern. Have a look at this page on "Introduction to IPC in Ruby" for a good intro with examples on passing messages between processes.
Assume that we have Linux + Apache + PHP installed with all default settings. I have some PHP website that uses some large thirdparty PHP library, let's say 1 Mb of PHP sources. This library is used very rarely, let's say for POST requests only. There is a reason why I can't move this library usage into separate PHP file. So, I have to include this library for each HTTP request, but use it very rarely. Should I concern about time spend for PHP parsing in that case? Let me explain. I could do this:
<?php
require_once('heavy_library.php');
// do regular stuff
if(we need heavy library)
{
heavy_library_function();
}
?>
I assume that this solution is bad because in this case heavy_library.php is parsed for each HTTP request. I can move it into the if statement:
<?php
// do regular stuff
if(we need heavy library)
{
require_once('heavy_library.php');
heavy_library_function();
}
?>
Now as I understand it it's being parsed only in case when we need the library.
Now, get back to the question. Default settings for Apache and PHP. Should I concern about this issue? Should I move require_once into the place where it is really being used, or I can leave it as usual and Apache / PHP will do some kind of caching that will prevent parsing for each HTTP request?
No, Apache will not do the caching. You should keep the require_once inside the if so it is only used when you need it.
If you do want caching of PHP, then look at something like eaccelerator.
When you tell PHP to require() something, it will do it no matter what; the only thing that prevents parsing that file from scratch every time will be to use an opcode cache such as APC.
Conditionally loading the file would be preferred in this case. If you're worried about making life more complicated by having these conditions, perform a small benchmark.
You could also use autoloading to load files "on demand" automatically; see spl_autoload
I have been scouring the internet trying to figure this one out. Any ideas would help. I'm trying to take an .xfdl file (base64gzip of XML) from my server and convert it to .xml with PHP for viewing and modification but, I can't see to figure out this process. I've seen people try to with Ruby but, I don't know any Ruby. If no one can help, I guess I'll be learning Ruby hahaha! Thanks in advanced. Also, I have looked through this website and couldn't find any php examples of this.
Assuming the XML is gzipped first and then base64 encoded, you can use base64_decode() and gzdecode().
echo gzdecode(base64_decode(file_get_contents('file.xfdl')));
However if you're not running on a Windows box, you will need to compile PHP with --with-zlib to include the zlib library and functions.
Once you have it in XML form, you might want to look at XMLReader to see how you can modify and read XML in PHP.
I currently have a php file that must read hundreds of XML files, I have no choice on how these XML files are constructed, they are created by a third party.
The first xml file is a large amount of titles for the rest of the xml files, so I search the first xml file to get file names for the rest of the xml files.
I then read each xml file searching its values for a specific phrase.
This process is really slow. I'm talking 5 1/2 minute runtimes... Which is not acceptable for a website, customers wont stay on for that long.
Does anyone know a way which could speed my code up, to a maximum runtime of approx 30s.
Here is a pastebin of my code : http://pastebin.com/HXSSj0Jt
Thanks, sorry for the incomprehensible English...
Your main problem is you're trying to make hundreds of http downloads to perform the search. Unless you get rid of that restriction, it's only gonna go so fast.
If for some reason the files aren't cachable at all(unlikely), not even some of the time, you can pick up some speed by downloading in parallel. See the curl_multi_*() functions. Alternatively, use wget from the command line with xargs to download in parallel.
The above sounds crazy if you have any kinda of traffic though.
Most likely, the files can be cached for at least a short time. Look at the http headers and see what kind of freshness info their server sends. It might say how long until the file expires, in which case you can save it locally until then. Or, it might give a last modified or etag, in which case you can do conditional get requests, which should speed things up still.
I would probably set up a local squid cache and have php make these requests through squid. It'll take care of all the use the local copy if its fresh, or conditionally retrieve a new version logic for you.
If you still want more performance, you can transform cached files into a more suitable format(eg, stick the relevant data in a database). Or if you must stick with the xml format, you can do a string search on the file first, to test whether you should bother parsing that file as xml at all.
First of all if you have to deal with large xml files for each request to your service it is wise to download the xml's once, preprocess and cache them locally.
If you cannot preprocess and cache xml's and have to download them for each request (which I don't really believe is the case) you can try optimize by using XMLReader or some SAX event-based xml parser. The problem with SimpleXML is that it is using DOM underneath. DOM (as the letters stand for) creates document object model in your php process memory which takes a lot of time and eats tons of memory. I would risk to say that DOM is useless for parsing large XML files.
Whereas XMLReader will allow you to traverse the large XML node by node without barely eating any memory with the tradeoff that you cannot issue xpath queries or any other non-consequencial node access patterns.
How to use xmlreader you can consult with php manual for XMLReader extension
I really want to mix Lua and PHP, for example receive a PHP query and process some parts of the query using Lua scripts (being called from the PHP script that got the initial query ).
Any clues about this ? I've seen some libraries to use Lua as some kind of PHP replacement, but I've seen nothing clear about how to use both Lua and PHP together.
Thanks
Have you seen phplua? It looks like it could do what you want. I found it via the Lua binding-with-other-languages page (it was the only relevant option, for better or worse).
There is now a full PECL extension for embedding and interacting with Lua code.
See http://pecl.php.net/package/lua and http://www.php.net/manual/en/book.lua.php
I assume you aren't feeling up to embedding lua in PHP which leaves running lua as an external script and reading the result. That's a pretty common operation and this page provides some guidance.