simplexml cache file - php

I am loading an XML file which is pretty heavy and contains a lot of data, and I am only using a small amount of it. This is causing my site to take ages to load.
http://www.footballadvisor.net/
Can anyone advice me on a resolution to this, is there a way in simplexml that I can cache the file for a period of time? My site is in PHP.
Thanks in advance
Richard

You wouldn't do it directly with simplexml. What you'd need to use is the other file functions (fopen or file_get_contents), save the file or extract the bits you need and save it. If enough time has passed since the last time it was checked (or if enough time passed that new data would be available) you could delete the cached data and check again.

Related

Simple HTML Dom save file with Cron Job once a day, then access that saved file

I am using SimpleHTMLDom to get some info off of a RSS feed. This data is only updated once a day around 7am. I would like to use the feature $html->save('result.htm'); Then have my page just load the result.htm file instead of running the parse each time I look at the page.
I guess I am wondering, would this be a good idea? Would it really speed the page load time up that much? Would using a cache be similar or maybe better?
(this question almost address this)
yes, it would be a good idea and you couldn't get any faster (unless you load the page to webserver memory and serve it from there).
just extend the cronjob you already have to process the data with SimpleHTMLDom and save the html it produced at 7am. Then keep serving that file until the next morning.
Just make sure you create a tmp-file first (result.tmp.html) the next morning and only do the move/rename once the cronjob finishes.
i am not sure i told you anything you didn't know already...

Can you get a specific xml value without loading the full file?

I recently wrote a PHP plugin to interface with my phpBB installation which will take my users' Steam IDs, convert them into the community ids that Steam uses on their website, grab the xml file for that community id, get the value of avatarFull (which contains the link to the full avatar), download it via curl, resize it, and set it as the user's new avatar.
In effect it is syncing my forum's avatars with Steam's avatars (Steam is a gaming community/platform and I run a gaming clan). My issue is that whenever I am reading the value from the xml file it takes around a second for each user as it loads the entire xml file before searching for the variable and this causes the entire script to take a very long time to complete.
Ideally I want to have my script run several times a day to check each avatarFull value from Steam and check to see if it has changed (and download the file if it has), but it currently takes just too long for me to tie up everything to wait on it.
Is there any way to have the server serve up just the xml value that I am looking for without loading the entire thing?
Here is how I am calling the value currently:
$xml = #simplexml_load_file("http://steamcommunity.com/profiles/".$steamid."?xml=1");
$avatarlink = $xml->avatarFull;
And here is an example xml file: XML file
The file isn't big. Parsing it doesn't take much time. Your second is wasted mostly for network communication.
Since there is no way around this, you must implement a cache. Schedule a script that will run on your server every hour or so, looking for changes. This script will take a lot of time - at least a second for every user; several seconds if the picture has to be downloaded.
When it has the latest picture, it will store it in some predefined location on your server. The scripts that serve your webpage will use this location instead of communicating with Steam. That way they will work instantly, and the pictures will be at most 1 hour out-of-date.
Added: Here's an idea to complement this: Have your visitors perform AJAX requests to Steam and check if the picture has changed via JavaScript. Do this only for pictures that they're actually viewing. If it has, then you can immediately replace the outdated picture in their browser. Also you can notify your server who can then download the updated picture immediately. Perhaps you won't even need to schedule anything yourself.
You have to read the whole stream to get to the data you need, but it doesn't have to be kept in memory.
If I were doing this with Java, I'd use a SAX parser instead of a DOM parser. I could handle the few values I was interested in and not keep a large DOM in memory. See if there's something equivalent for you with PHP.
SimpleXml is a DOM parser. It will load and parse the entire document into memory before you can work with it. If you do not want that, use XMLReader which will allow you to process the XML while you are reading it from a stream, e.g. you could exit processing once the avatar was fetched.
But like other people already pointed out elsewhere on this page, with a file as small as shown, this is likely rather a network latency issue than an XML issue.
Also see Best XML Parser for PHP
that file looks small enough. It shouldn't take that long to parse. It probably takes that long because of some sort of network problem and the slowness of parsing.
If the network is your issue then no amount of trickery will help you :(.
If isn't the network then you could try a regex match on the input. That will probably be marginally faster.
Try this expression:
/<avatarFull><![CDATA[(.*?)]]><\/avatarFull>/
and read the link from the first group match.
You could try the SAX way of parsing (http://php.net/manual/en/book.xml.php) but as i said since the file is small i doubt it will really make a difference.
You can take advantage of caching the results of simplexml_load_file() somewhere like memcached or filesystem. Here is typical workflow:
check if XML file was processed during last N seconds
return processing results on success
on failure get results from simplexml
process them
resize images
store results in cache

PHP function for getting file size and loading time for a page

Similar to my last question, I'd like to have a PHP function that can take a local path and tell me (a) how much the total file size is for HTML, CSS, JS and images, and (b) the total load time for this page. Like YSlow I think, but as a PHP function.
Any thoughts? I looked around and was wondering can I use CURL for this? Even though I need to check paths that are on my own server? thanks!
Update:
After reading the comments, realizing I'm off base. Instead wondering is there a way programatically get a YSlow score for a page (or similar performance score). I assume it would need to hit a third-party site that would act as the client. I'm basically trying to loop through a group of pages and get some sort of performance metric. Thanks!
For the filesize.
Create a loop to read all files in a specific directory with dir.
Then for each file use filesize.
Loadtime
Loadtime depends on the connection speed and the filesize. And I see that you specify that you are reading locally the files. You can detect how much time it take you to read those files but this will not be the loadtime for the page for an external user.

How can I avoid reloading an XML document multiple times?

tl;dr: I want to load an XML file once and reuse it over and over again.
I have a bit of javascript that makes an ajax request to a PHP page that collects and parses some XML and returns it for display (like, say there are 4,000 nodes and the PHP paginates the results into chunks of 100 you would have 40 "pages" of data). If someone clicks on one of those other pages (besides the one that initially loads) then another request is made, the PHP loads that big XML file, grabs that subset of indexes (like records 200-299) and returns them for display. My question is, is there a way to load that XML file only once and just reuse it over and over?
The process on each ajax request is:
- load the xml file (simplexml_load_file())
- parse out the bits needed (with xpath)
- use LimitIterator to grab the specific set of indexes I need
- return that set
When what I'd like it to be when someone requests a different paginated result is:
- use LimitIterator on the data I loaded in the previous request (reparse if needed)
- return that set
It seems (it is, right?) that hitting the XML file every time is a huge waste. How would I go about grabbing it and persisting it so that different pagination requests don't have to reload the file every time?
Just have your server do the reading and parsing of the paginated file based on the user input and feedback. Meaning it can be cached on the server much quicker than it would take the client to download and cache the entire XML document. Use PHP, Perl, ASP or what have you to paginate the data prior to displaying it to the user.
I believe the closest thing you are going to get is Memcached.
Although, I wouldn't worry about it, especially if it is a local file. include like operations are fairly cheap.
To the question "hitting the XML file every time is a huge waste" then answer is yes, if you have to parse that big XML file everytime. As I understand, you want to save the chunk the user is interested in so that you don't have to do that everytime. How about a very simple file cache? No extension required, fast, simple to use and maintain. Something like that:
function echo_results($start)
{
// IMPORTANT: make sure that $start is a valid number
$cache_file = '/path/to/cache/' . $start . '.xml';
$source = '/path/to/source.xml';
$mtime = filemtime($cache_file);
if (file_exists($cache_file)
&& filemtime($cache_file) < $mtime)
{
readfile($cache_file);
return;
}
$xml = get_the_results_chunk($start);
file_put_contents($cache_file, $xml);
echo $xml;
}
As an added bonus, you use the source file's last modification time so that you automatically ignore cached chunks that are older than their source.
You can even save it compressed and serve it as-is if the client supports gzip compression (IOW, 99% of browsers out there) or decompress it on-the-fly otherwise.
Could you load it into $_SESSION data? or would that blow out memory due to the size of the chunk?

Querying large XML file (600mb+) in PHP or JavaScript?

I have a large XML file (600mb+) and am developing a PHP application which needs to query this file.
My initial approach was to extract all the data from the file and insert it into a MySQL database - then query it that way. The only issue with this was that it was still slow, plus the XML data gets updated regularly - meaning I need to download, parse and insert data from the XML file into the database everytime the XML file is updated.
Is it actually possible to query a 600mb file? (for example, searching for records where TITLE="something here"?) Is it possible to get it to do this in a reasonable amount of time?
Ideally would like to do this in PHP, though I could also use JavaScript too.
Any help and suggestions appreciated :)
Constructing an XML DOM for a 600+ Mb document is definitely a way to fail. What you need is SAX-based API. SAX, though, does not usually allow XPath to be used, but you can emulate it with imperative code.
As for the file being updated, is it possible to retrieve only differences anyhow? That would massively speed up subsequent processing.

Categories