i run a system which needs to update various xml files from data stored in a db. The script runs via a server side php file which is monitored by a daemon so that it executes, finishes to free resources, then is restarted.
I have some benchmarking within the script, and when i have to update 100 xml files, its taking about 15 seconds to complete. A typical xml file which is created is around 6kb - I am creating the xml using php's dom, and writing using dom->save. The db is fully normalised, and the correct indexes are in place, the 3 queries that i need to perform which gets the necessary data i need to update the xml with only takes around 0.05 seconds. Therefore the bottleneck seems to be with the actual creating of the xml via dom and writing the file itself.
Does any have any ideas how i could really speed up the process? I have considered using a crc check to see whether the xml needs to be re-written, but this would still require me to read the xml file which i would be updating and i dont do this at the moment, so surely its just as bad as just saving a new file over the top of the old one? Also, i dont think its possible to edit certain parts of the xml, as the structure isnt uniform, the order of the nodes can change depending on what data is not null after being updated.
Really appreciate your thoughts on this!
Fifteen seconds to write a few XML files? That sounds way too much. Can you do some more profiling and find out which function exactly is the bottleneck?
Have you considered writing plain text XML (fwrite("<item>value</item>")) instead of building it by DOM? Sounds justifiable in this case.
Otherwise, for caching, there's always filemtime() that you could use to quickly get the "last modified" time of your XML file, and see whether the DB entry is younger than that. In a system like you describe, there should be no need to compare the contents.
Related
I need to load XML data from an external server/url into my MySQL database, using PHP.
I don't need to save the XML file itself anywhere, unless this is easier/faster.
The problem is, I will need this to run every hour or so as the data will be constantly updated, therefore I need to replace the data in my database too. The XML file is usually around 350mb.
The data in the MySQL table needs to be searchable - I will know the structure of the XML so can create the table to suit first.
I guess there are a few parts to this question:
What's the best way to automate this whole process to run every hour?
Whats the best(fastest?) way of downloading/ parsing the xml (~350mb) from the url? in a way that I can -
load it into a mysql table of my own, maintaining columns/ structure
1) A PHP script can keep running in background all the time, but this is not the best scenario or you can set a php -q /dir/to/php.php using cronos (if running on linux) or other techniques to makes server help you. (You still need access to server)
2) You can use several systems, the more linear one, less RAM consuming, is if you decide to work with files or with a modified mySQL access is opening your TCP connection, streaming smaller packages (16KB will be ok) and streaming them out on disk or another connection.
3) Moving so huge data is not difficult, but storing them in mySQL is not waste. Performing search in it is even worst. Updating it is trying to kill mySQL system.
Suggestions:
From what i can see, you are trying to synchronize or back-up data from another server. If there is just one file then make a local .xml using PHP and you are done. If there are more than one i will still suggest to make local files as most probably you are working with unstructured data: they are not for mySQL. If you work with hundreds of files and you need to search them fast perform statistics and much much more... consider to change approach and read about hadoop.
MySQL BLOOB or TEXT columns still not support more than 65KB, maybe you know another technique, but i never heard about it and I will never suggest to do so. If you are trying it just to use SQL SEARCH commands you took the wrong path.
I am doing a small website project. In a page their is a section where the client posts new updates, at any given time there will be a maximum of 5 to 6 posts in this division. I was trying to create a MySQL database for the content. But I wonder if their is anyway I could have all the entries as XML files and use PHP to parse it. Is it possible ?
Which one is the better option MySQL or XML?
XML is a horrid piece of crap in my opinion. It's bloated and rather unpleasant to work with. However, it is a viable option as long as your number of entries and the amount of traffic stays small.
You can use SimpleXML to parse the XML, but the performance is going to degrade as file size increases. MySQL, however, will handle quite a lot of data before performance becomes a concern provided the schema is properly setup.
If you do use XML, you could always use a half-way XML solution. Like parse the file once, then store a serialized array of it.
Though really, if you're going to store it in a file of some sort, I would suggest, in order: SQLite, serialized array, JSON, XML. (Depending on your situation that order may change.)
If you abstract away the low level details enough, you should be able to make adapters that can be used interchangeably, thus allowing you to easily switch out storage backends. (On a large project, that would likely be unfeasible, but it sounds like your data storage/retrieval will remain fairly simple.)
Based on this tutorial I have built a page which functions correctly. I added a couple of dropdown boxes to the page, and based on this snippet, have been able to filter the results accordingly. So, in practice everything is working as it should. However, my question is regarding the efficiency of the proceedure. Right now, the process looks something like this:
1.) Users visits page
2.) Body onload() is called
3.) Javascript calls a PHP script, which queries the database (based on criteria passed along via the URL) and exports that query to an XML file.
4.) The XML file is then parsed via javascript on the users local machine.
For any one search there could be several thousand results (and thus, several thousand markers to place on the map). As you might have guessed, it takes a long time to place all of the markers. I have some ideas to speed it up, but wanted to touch base with experienced users to verify that my logic is sound. I'm open to any suggestions!
Idea #1: Is there a way (and would it speed things up?) to run the query once, generating an XML file via PHP which contained all possible results, store the XML file locally, then do the filtering via javascript?
Idea #2: Create a cron job on the server to export the XML file to a known location. Instead of using "Gdownloadurl(phpfile.php," I would use gdownloadurl(xmlfile.xml). Thus eliminating the need to run a new query every time the user changes the value of a drop down box
Idea #3: Instead of passing criteria back to the php file (via the URL) should I just be filtering the results via javascript before placing the marker on the map?
I have seen a lot of webpages that place tons and tons of markers on a google map and it doesn't take nearly as long as my application. What's the standard practice in a situation like this?
Thanks!
Edit: There may be a flaw in my logic: If I were to export all results to an XML file, how (other than javascript) could I then filter those results?
Your logic is sound, however, I probably wouldn't do the filtering in Javascript. If the user's computer is not very fast, then performance will be adversely affected. It is better to perform the filtering server side based on a cached resource (xml in your case).
The database is probably the biggest bottleneck in this operation, so caching the result would most likely speed your application up significantly. You might also consider you have setup your keys correctly to make your query as fast as possible.
I have to load some XML data (generated from a database, using PHP) into a flash slideshow.
The database data will change only when someone edit the website at it's backend.
In terms of loading speed and performance, which is best:
1) Generate the XML data dynamically from the database, each time the page is loaded;
2) Generate a .XML file whenever the database is updated, which will be read by the flash file.
The fastest is likely
3) use Memcached
Otherwise it is likely 2 because connecting to a database is usually a bottleneck and often slower than file I/O. But then again, you could simply benchmark it to see which works best for you. That's much better than assuming.
Also, have a look at this related question:
File access speed vs database access speed
#JapanPro He wouldn't need to write to the XML file when it was requested, just when someone saved something to the database. This would mean a much better load speed compared to pulling data from a database everytime.
Of course it depends how much data we're talking and whether it's worth writing to a file first. As #Gordon said, run some tests to see which works better for you
I think got for 1) Generate the XML data dynamically from the database, each time the page is loaded; is a good choice as its its normal as html. coz i think writing file always need more resource.
it depend on how your code is , if your code is processing alot of data every time, then writing file make sense
Dropping my lurker status to finally ask a question...
I need to know how I can improve on the performance of a PHP script that draws its data from XML files.
Some background:
I've already mapped the bottleneck to CPU - but want to optimize the script's performance before taking a hit on processor costs. Specifically, the most CPU-consuming part of the script is the XML loading.
The reason I'm using XML to store object data because the data needs to be accessible via a browser Flash interface, and we want to provide fast user access in that area. The project is still in early stages though, so if best practice would be to abandon XML altogether, that would be a good answer too.
Lots of data: Currently plotting for roughly 100k objects, albeit usually small ones - and they must ALL be taken up into the script, with perhaps a few rare exceptions. The data set will only grow with time.
Frequent runs: Ideally, we'd run the script ~50k times an hour; realistically, we'd settle for ~1k/h runs. This coupled with data size makes performance optimization completely imperative.
Already taken an optimization step of making several runs on the same data rather than loading it for each run, but it's still taking too long. The runs should generally use "fresh" data with the modifications done by users.
Just to clarify: is the data you're loading coming from XML files for processing in its current state and is it being modified before being sent to the Flash application?
It looks like you'd be better off using a database to store your data and pushing out XML as needed rather than reading it in XML first; if building the XML files gets slow you could cache files as they're generated in order to avoid redundant generation of the same file.
If the XML stays relatively static, you could cache it as a PHP array, something like this:
<xml><foo>bar</foo></xml>
is cached in a file as
<?php return array('foo' => 'bar');
It should be faster for PHP to just include the arrayified version of the XML.
~1k/hour, 3600 seconds per hour, more than 3 runs a second (let alone the 50k/hour)...
There are many questions. Some of them are:
Does your php script need to read/process all records of the data source for each single run? If not, what kind of subset does it need (~size, criterias, ...)
Same question for the flash application + who's sending the data? The php script? "Direct" request for the complete, static xml file?
What operations are performed on the data source?
Do you need some kind of concurrency mechanism?
...
And just because you want to deliver xml data to the flash clients it doesn't necessarily mean that you have to store xml data on the server. If e.g. the clients only need a tiny little subset of the availabe records it probably a lot faster not to store the data as xml but something more suited to speed and "searchability" and then create the xml output of the subset on-the-fly, maybe assisted by some caching depending on what data the client request and how/how much the data changes.
edit: Let's assume that you really,really need the whole dataset and need a continuous simulation. Then you might want to consider a continuous process that keeps the complete "world model" in memory and operates on this model on each run (world tick). This way at least you wouldn't have to load the data on each tick. But such a process is usually written in something else than php.