I have to load some XML data (generated from a database, using PHP) into a flash slideshow.
The database data will change only when someone edit the website at it's backend.
In terms of loading speed and performance, which is best:
1) Generate the XML data dynamically from the database, each time the page is loaded;
2) Generate a .XML file whenever the database is updated, which will be read by the flash file.
The fastest is likely
3) use Memcached
Otherwise it is likely 2 because connecting to a database is usually a bottleneck and often slower than file I/O. But then again, you could simply benchmark it to see which works best for you. That's much better than assuming.
Also, have a look at this related question:
File access speed vs database access speed
#JapanPro He wouldn't need to write to the XML file when it was requested, just when someone saved something to the database. This would mean a much better load speed compared to pulling data from a database everytime.
Of course it depends how much data we're talking and whether it's worth writing to a file first. As #Gordon said, run some tests to see which works better for you
I think got for 1) Generate the XML data dynamically from the database, each time the page is loaded; is a good choice as its its normal as html. coz i think writing file always need more resource.
it depend on how your code is , if your code is processing alot of data every time, then writing file make sense
Related
I need to load XML data from an external server/url into my MySQL database, using PHP.
I don't need to save the XML file itself anywhere, unless this is easier/faster.
The problem is, I will need this to run every hour or so as the data will be constantly updated, therefore I need to replace the data in my database too. The XML file is usually around 350mb.
The data in the MySQL table needs to be searchable - I will know the structure of the XML so can create the table to suit first.
I guess there are a few parts to this question:
What's the best way to automate this whole process to run every hour?
Whats the best(fastest?) way of downloading/ parsing the xml (~350mb) from the url? in a way that I can -
load it into a mysql table of my own, maintaining columns/ structure
1) A PHP script can keep running in background all the time, but this is not the best scenario or you can set a php -q /dir/to/php.php using cronos (if running on linux) or other techniques to makes server help you. (You still need access to server)
2) You can use several systems, the more linear one, less RAM consuming, is if you decide to work with files or with a modified mySQL access is opening your TCP connection, streaming smaller packages (16KB will be ok) and streaming them out on disk or another connection.
3) Moving so huge data is not difficult, but storing them in mySQL is not waste. Performing search in it is even worst. Updating it is trying to kill mySQL system.
Suggestions:
From what i can see, you are trying to synchronize or back-up data from another server. If there is just one file then make a local .xml using PHP and you are done. If there are more than one i will still suggest to make local files as most probably you are working with unstructured data: they are not for mySQL. If you work with hundreds of files and you need to search them fast perform statistics and much much more... consider to change approach and read about hadoop.
MySQL BLOOB or TEXT columns still not support more than 65KB, maybe you know another technique, but i never heard about it and I will never suggest to do so. If you are trying it just to use SQL SEARCH commands you took the wrong path.
I'm presently on a e-learning project using the custom php (not any framework or cms).For content of one page of my project i have to fetch about 1000 data from database,presently i'm using pagination on that page and displaying 100s of data every page.Now i'm thinking if i fetch all data from the database at a time and store it in xml and when user sweep between the pages of pagination the data will be fetched from the xml rather than database it may be good in the sense that it will reduce the database hits.But i have confusion that is the xml pursing may effect on my project execution time?If any better better idea please share with me.
My project's environment is like below
php 5
Mysql
Jquery
This still sounds inefficient since you have to still parse the xml.
I believe the most efficient way to do it (optimised for page views) would be to pre-generate the html of your lists.
That means everytime the database changes, you re-create the html, but only once.
Then all you do is simply serve that html from your web-server without any script executing.
My question is fairly simple; I need to read out some templates (in PHP) and send them to the client.
For this kind of data, specifically text/html and text/javascript; is it more expensive to read them out a MySQL database or out of files?
Kind regards
Tom
inb4 security; I'm aware.
PS: I read other topics about similar questions but they either had to do with other kind of data, or haven't been answered.
Reading from a database is more expensive, no question.
Where do the flat files live? On the file system. In the best case, they've been recently accessed so the OS has cached the files in memory, and it's just a memory read to get them into your PHP program to send to the client. In the worst case, the OS has to copy the file from disc to memory before your program can use it.
Where does the data in a database live? On the file system. In the best case, they've been recently accessed so MySQL has that table in memory. However, your program can't get at that memory directly, it needs to first establish a connection with the server, send authentication data back and forth, send a query, MySQL has to parse and execute the query, then grab the row from memory and send it to your program. In the worst case, the OS has to copy from the database table's file on disk to memory before MySQL can get the row to send.
As you can see, the scenarios are almost exactly the same, except that using a database involves the additional overhead of connections and queries before getting the data out of memory or off disc.
There are many factors that would affect how expensive both are.
I'll assume that since they are templates, they probably won't be changing often. If so, flat-file may be a better option. Anything write-heavy should be done in a database.
Reading a flat-file should be faster than reading data from the database.
Having them in the database usually makes it easier for multiple people to edit.
You might consider using memcache to store the templates after reading them, since reading from memory is always faster than reading from a db or flat-file.
It really doesnt make enough difference to worry you. What sort of volume are you working with? Will you have over a million page views a day? If not I'd say pick whichever one is easiest for you to code with and maintain and dont worry about the expense of the alternatives until it becomes a problem.
Specifically, if your templates are currently in file form I would leave them there, and if they are currently in DB form I'd leave them there.
i run a system which needs to update various xml files from data stored in a db. The script runs via a server side php file which is monitored by a daemon so that it executes, finishes to free resources, then is restarted.
I have some benchmarking within the script, and when i have to update 100 xml files, its taking about 15 seconds to complete. A typical xml file which is created is around 6kb - I am creating the xml using php's dom, and writing using dom->save. The db is fully normalised, and the correct indexes are in place, the 3 queries that i need to perform which gets the necessary data i need to update the xml with only takes around 0.05 seconds. Therefore the bottleneck seems to be with the actual creating of the xml via dom and writing the file itself.
Does any have any ideas how i could really speed up the process? I have considered using a crc check to see whether the xml needs to be re-written, but this would still require me to read the xml file which i would be updating and i dont do this at the moment, so surely its just as bad as just saving a new file over the top of the old one? Also, i dont think its possible to edit certain parts of the xml, as the structure isnt uniform, the order of the nodes can change depending on what data is not null after being updated.
Really appreciate your thoughts on this!
Fifteen seconds to write a few XML files? That sounds way too much. Can you do some more profiling and find out which function exactly is the bottleneck?
Have you considered writing plain text XML (fwrite("<item>value</item>")) instead of building it by DOM? Sounds justifiable in this case.
Otherwise, for caching, there's always filemtime() that you could use to quickly get the "last modified" time of your XML file, and see whether the DB entry is younger than that. In a system like you describe, there should be no need to compare the contents.
Dropping my lurker status to finally ask a question...
I need to know how I can improve on the performance of a PHP script that draws its data from XML files.
Some background:
I've already mapped the bottleneck to CPU - but want to optimize the script's performance before taking a hit on processor costs. Specifically, the most CPU-consuming part of the script is the XML loading.
The reason I'm using XML to store object data because the data needs to be accessible via a browser Flash interface, and we want to provide fast user access in that area. The project is still in early stages though, so if best practice would be to abandon XML altogether, that would be a good answer too.
Lots of data: Currently plotting for roughly 100k objects, albeit usually small ones - and they must ALL be taken up into the script, with perhaps a few rare exceptions. The data set will only grow with time.
Frequent runs: Ideally, we'd run the script ~50k times an hour; realistically, we'd settle for ~1k/h runs. This coupled with data size makes performance optimization completely imperative.
Already taken an optimization step of making several runs on the same data rather than loading it for each run, but it's still taking too long. The runs should generally use "fresh" data with the modifications done by users.
Just to clarify: is the data you're loading coming from XML files for processing in its current state and is it being modified before being sent to the Flash application?
It looks like you'd be better off using a database to store your data and pushing out XML as needed rather than reading it in XML first; if building the XML files gets slow you could cache files as they're generated in order to avoid redundant generation of the same file.
If the XML stays relatively static, you could cache it as a PHP array, something like this:
<xml><foo>bar</foo></xml>
is cached in a file as
<?php return array('foo' => 'bar');
It should be faster for PHP to just include the arrayified version of the XML.
~1k/hour, 3600 seconds per hour, more than 3 runs a second (let alone the 50k/hour)...
There are many questions. Some of them are:
Does your php script need to read/process all records of the data source for each single run? If not, what kind of subset does it need (~size, criterias, ...)
Same question for the flash application + who's sending the data? The php script? "Direct" request for the complete, static xml file?
What operations are performed on the data source?
Do you need some kind of concurrency mechanism?
...
And just because you want to deliver xml data to the flash clients it doesn't necessarily mean that you have to store xml data on the server. If e.g. the clients only need a tiny little subset of the availabe records it probably a lot faster not to store the data as xml but something more suited to speed and "searchability" and then create the xml output of the subset on-the-fly, maybe assisted by some caching depending on what data the client request and how/how much the data changes.
edit: Let's assume that you really,really need the whole dataset and need a continuous simulation. Then you might want to consider a continuous process that keeps the complete "world model" in memory and operates on this model on each run (world tick). This way at least you wouldn't have to load the data on each tick. But such a process is usually written in something else than php.