what would be the proper way to automate an xml import - php

I've written a script that imports data from an xml file into the mysql database by selecting it from the source disk and uploading it via a button submital. But what if a 3rd party application were to be used to automate this import. Would it be proper to check if a get parameter of a xml path exist and grab its content and import the same way i did before? or is there a better method?
by get parameter i mean like this:
http://domain.com/import.php?path=externaldomain.com/xml/page.xml

it depends on what kind of data you are importing. If you import data from an rss feed, this method is fine. But if you are going to import personal data this might not really be a good method.
I would suggest something more secure if you are working with critical data that others are not supposed to see. You can start thinking of importing the xml files through ftp, download them from behind a server secured folder. Ask the 3rd party application to upload the xml files to a secure location of your choosing. Anything that goes on behind some kind of security is better then the suggested method for personal data.

Firstly I'd advice you using cURL. Doesn't matter how huge is your XML will be, you'll have less problems with memory.
$fp = fopen('/var/www/vhosts/my.com/xml/feed.xml', 'w'); // opening file handler to write feed in
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://domain.com/xml/page.xml'); // setting URL to take XML from
curl_setopt($ch, CURLOPT_ENCODING, 'gzip'); // If result is gziped
curl_setopt($ch, CURLOPT_SSLVERSION, 3); // OpenSSL issue
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0); // Wildcard certificate
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); // disabling buffer output, bec. we want to write XML to the file first and don't need it to be returned into variable
curl_setopt($ch, CURLOPT_FILE, $fp); // here we should transfer opened file handler to the cURL and it should be writable!
$result = curl_exec($ch); // executing download
$reponse_code = (int)curl_getinfo($ch, CURLINFO_HTTP_CODE); // retrieving HTTP return code for our request. Was it successful or not.
Thus, you can download/save your XML feed even if it is behind SSL and GZIPed, directly to the file.
Using curl_getinfo() you can get diverse information about your request. If procedure supposed to be automated than it would be nice to decide what to do if request fails.
Than, if file is not large (I mean really large files above 200 - 300 Mb) you can just use SimpleXML (available only since PHP5) library and parse your data. If you are under PHP4 (it is still possible today) try to find libXML which is very useful too.
If file you retrieved is rather huge :) MySQL database with FILE permissions is your friend.

Related

Tunelling link data through PHP?

I want to be able to go to mydoma.in/tunnel.php?file=http://otherdoma.in/music.mp3, and then get the data of http://otherdoma.in/music.mp3 streamed to the client.
I tried doing this via Header();, but it redirects instead of "tunelling" the data.
How can i do this?
Use cURL for streaming:
<?php
$url = $_GET["file"];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_BUFFERSIZE, 256);
curl_exec($ch);
curl_close($ch);
?>
If they are small, you might be able to use file_get_contents(). Otherwise, you should probably use cURL. You would want to cURL the URL from the get variable "file". Then save it to a local temporary location with PHP. Then, use header() to direct yourself to the local file. Deleting the temporary file is the only issue, as there isn't really a way to determine when you have finished downloading it or not. So you might be able to sleep or delay the file removal, but you might find it's a better option to use a cron job to clean up all of the temporary files later.
Have your PHP script pull the remote content:
$data = file_get_contents($remote_url);
And then just spit it out:
echo $data;
Or simply:
echo file_get_contents($remote_url);
You might have to add some headers to indicate the content type.
Alternatively, you could configure a proxy with something like nginx -- this will allow you to rewrite particular URLs to a remote site and then serve them as local, no coding required.

Running file_put_contents in parallel?

was searching stackoverflow for a solution, but couldn't find anything even close to what I am trying to achieve. Perhaps I am just blissfully unaware of some magic PHP sauce everyone is doing tackling this problem... ;)
Basically I have an array with give or take a few hundred urls, pointing to different XML files on a remote server. I'm doing some magic file-checking to see if the content of the XML files have changed and if it did, I'll download newer XMLs to my server.
PHP code:
$urls = array(
'http://stackoverflow.com/a-really-nice-file.xml',
'http://stackoverflow.com/another-cool-file2.xml'
);
foreach($urls as $url){
set_time_limit(0);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, false);
$contents = curl_exec($ch);
curl_close($ch);
file_put_contents($filename, $contents);
}
Now, $filename is set somewhere else and gives each xml it's own ID based on my logic.
So far this script is running OK and does what it should, but it does it terribly slow. I know my server can handle a lot more and I suspect my foreach is slowing down the process.
Is there any way I can speed up the foreach? Currently I am thinking to up the file_put_contents in each foreach loop to 10 or 20, basically cutting my execution time 10- or 20-fold, but can't think of how to approach this the best and most performance kind of way. Any help or pointers on how to proceed?
Your bottleneck (most likely) is your curl requests, you can only write to a file after each request is done, there is no way (in a single script) to speed up that process.
I don't know how it all works but you can execute curl requests in parallel: http://php.net/manual/en/function.curl-multi-exec.php.
Maybe you can fetch the data (if memory is available to store it) and then as they complete fill in the data.
Just run more script. Each script will download some urls.
You can get more information about this pattern here: http://en.wikipedia.org/wiki/Thread_pool_pattern
The more script your run the more parallelism you get
I use on paralel requests guzzle pool ;) ( you can send x paralel request)
http://docs.guzzlephp.org/en/stable/quickstart.html

send xml to external site in background

I have a form allowing a user to signup for a news letter which submits back to the page it's sat in for validation and adding the content to the db, however I also need to send an xml file to a third part using the information collected from the form to add to a mailing list. The data sent to the third party seems to need to be sent using the post method.
How can I achieve this?
I tried AJAX, but realised after a bit that AJAX isn't able to send info to external links so abandoned that.
Essentially the site needs to reload the page, validate the info sent to it, either return errors or add info the db and fire off the xml in the background, so having it send a separate form after reload isn't ideal either. Also the third party page when sent the xml through the main form loads it's own page, which is far from pretty and takes the user away from our site, not good at all.
You will have to validate in PHP and then send the XML from the
<?php
$hCurl = curl_init();
curl_setopt($hCurl, CURLOPT_PUT, true);
curl_setopt($hCurl, CURLOPT_HEADER, true);
curl_setopt($hCurl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($hCurl, CURLOPT_CONNECTTIMEOUT, 60);
curl_setopt($hCurl, CURLOPT_URL, $URL_TO_UPLOAD);
curl_setopt($hCurl, CURLOPT_HTTPHEADER, $aCurlHeaders);
// TODO it could be possible that fopen() would return an invalid handle or not work altogether. Should handle that
$fp = fopen ($XML_FILE, "r");
curl_setopt($hCurl, CURLOPT_INFILE, $fp);
curl_setopt($hCurl, CURLOPT_INFILESIZE, $finfo['size']);
$sResp = curl_exec($hCurl);
?>
Just replace $URL_TO_UPLOAD with your server that you want to POST to and $XML_FILE with the file you want to send and we are done!
I would recommend getting your server to submit the data to the third party once it has added the information to the database. It can even queue up this process and deal with it at a later date if needed.
There are lots of ways of doing this in PHP, such as Curl.
How about the XML is sent not by your user's browser, but generated and sent by your server? You could still use AJAX, and you'd have no headaches about users leaving your site.
Something along the lines of
Browser -> Server
Server -> write into own DB
Server -> generate an XML file and send it to the foreign server

Download contents of the PHP generated page from another PHP script

I have a PHP script on a server that generates the XML data on the fly, say with Content-Disposition:attachment or with simple echo, doesn't matter. I'll name this file www.something.com/myOwnScript.php
On another server, in another PHP script I want to be able to get this file (to avoid "saving file to disk") as a string (using the path www.something.com/myOwnScript.php) and then manipulate XML data that the script generates.
Is this possible without using web services?
security implications?
Thanks
Simple answer, yes:
$output = file_get_contents('http://www.something.com/myOwnScript.php');
echo '<pre>';
print_r($output);
echo '</pre>';
If you want more control over how you request the data (spoof headers, send post fields etc.) you should look into cURL.
link text
If you're on a shared host, you might find that you cannot use file_get_contents. This mainly because it is part of the same permission sets that allow you to include remote files. Anyway...
If you're stuck in that circumstance, you might be able to use CURL:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
?>
It is more code, but it's still simple. You have the added benefit of being able to post data, set headers, cookies... anything you could do with a highly configurable browser. This makes it useful when people attempt to block bots.

Equivalent is_file() function for URLs?

What is the best way to check if a given url points to a valid file (i.e. not return a 404/301/etc.)? I've got a script that will load certain .js files on a page, but I need a way to verify each URL it receives points to a valid file.
I'm still poking around the PHP manual to see which file functions (if any) will actually work with remote URLs. I'll edit my post as I find more details, but if anyone has already been down this path feel free to chime in.
The file_get_contents is a bit overshooting the purpose as it is enough to have the HTTP header to make the decision, so you'll need to use curl to do so:
<?php
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
?>
one such way would be to request the url and get a response with a status code of 200 back, aside from that, there's really no good way because the server has the option of handling the request however it likes (including giving you other status codes for files that exist, but you don't have access to for a number of reasons).
If your server doesn't have fopen wrappers enabled (any server with decent security won't), then you'll have to use the CURL functions.

Categories