saving unknown files with curl w/ PHP 5.3.x - php

I'm trying to archive a web base forum that has attachments that users have posted. So far, I made use of the php cURL library to get the individual topics and have been able to save the raw pages. However, I now need to figure out a way to archive the attachments that are located on the site.
Here is the problem: Since the file type is not consistent, I need to find a way to save the files with the correct extension. Note that I plan to rename the file when I save it so that it's organized in a way that it can be easily found later.
The link to the attached files in a page is in the format:
some file.txt
I've already used preg_match() to get the URL's to the attached files. My biggest problem now is now just making sure the fetched file is saved in the correct format.
My question: Is there any way to get the file type efficiently? I'd rather not have to use a regular expression, but I'm not seeing any other way.

Does the server add the correct Content-Type header field when serving the files? You can then intercept that with setting CURLOPT_HEADER or file_get_contents + $http_response_header.
http://www.php.net/manual/en/reserved.variables.httpresponseheader.php

i would look into
http://www.php.net/manual/en/book.fileinfo.php
to see if you can automatically grab the file type when you get ahold of it.

you can use DOMDocument and DOMXpath to extract urls and filename safely.
$doc=new DOMDocument();
$doc->loadHTML($content);
$xpath= new DOMXpath($doc);
//query examples:
foreach($xpath->query('//a') as $node)
echo $node->nodeValue;
foreach($xpath->query('//a/#href') as $node)
echo $node->nodeValue;

Related

How to GET an attachment from a Apache CouchDB document with Sag?

I have a document with an image attachment myimg.jpg which I would like to GET using Sag.
In my browser I am able to retrieve this image if I visit this url: http://localhost:5984/mydb/thedocid/myimg.jpg.
Using sag I am able to retrieve documents, but unable to retrieve attachments. I have tried to retrieve the image like so:
$img = $sag->get('thedocid/myimg.jpg')->body;
Instead of retrieving the image PHP seems to become unresponsive. I also thought disabling JSON decode might solve it, but it still causes PHP to become unresponsive.
$sag->decode(false);
$img = $sag->get('thedocid/myimg.jpg');
What am I doing wrong? How does one properly retrieve an attachment using Sag?
EDIT: After quite some time the attachment has been retrieved. Why is it so slow? The attachment is merely 4kb.
I still do not know why my initial code was so unresponsive/extremely slow, but thanks to Dave's comment I got an alternative way to retrieve the document with the attachment:
$doc = $sag->get('thedocid/?attachments=true')->body;
$img = base64_decode($doc->_attachments->{'myimg.jpg'}->data);

DOMDocument - Directly load XML file from php://input

Currently I have a PHP file that reads posted XML and then converts/outputs it to JSON. This file looks like this:
<?php
file_put_contents('myxmlfile.xml', file_get_contents('php://input'));
$xmldoc = new DOMDocument();
$xmldoc->load("myxmlfile.xml");
$xpathvar = new DOMXPath($xmldoc);
// Etc etc, for the purpose of my question seeing the rest isn't necessary
// After finishing the conversion I save the file as a JSON file.
file_put_contents('myjsonfile.json', $JSONContent);
?>
The data I'm receiving comes in XML format. To convert it I'm currently saving it as an XML file, and then immediately after creating a new DOMDocument() and loading it in. My question is, is there any way I can cut out the middle man and just load in the XML directly using file_get_contents()?
Ideally it would be this (didn't work):
$xmldoc->load(file_get_contents("php://input"));
If anyone could help me do this I'd really appreciate it!
Thanks
To load from string, instead of filename, use loadXML method.
$xmldoc->loadXML(file_get_contents("php://input"));

Output XML file, without saving

I want to be able to create a XML File, add nodes and such to it, then output it to the screen without saving.
$bookxml = new DOMDocument('1.0', 'utf-8');
is how i have the XML file created, however i just can't anyway to display the XML file on the screen without saving it.
However i am having a problem with outputting even in save, this is the line i have
echo $bookxml->save("testing.xml");
All this does is return the file size of the newly created XML, and not the contents.
Any help would be awesome, i'm completely stumped on this.
What you're looking for is saveXML, additionally you can use htmlspecialchars to encode the xml so you can see it in your browser display.
echo htmlspecialchars($bookxml->saveXML());
You want the saveXML method, not the save method:
http://us3.php.net/manual/en/domdocument.savexml.php

saving and reading a xml file getting from the other url

may be i am going to ask some stupid question but i don't have any idea about php
that's why i want to know it i never worked in php and now i have to do it so please provide me some useful tips,
i have XML file that is coming from a different URL and i want to save it on the server then i have to read it and extract it to a page in proper format and some modification in data.
You can use DOM
$dom = new DOMDocument();
$dom->load('http://www.example.com');
This would load the XML from the remote URL. You can then process it as needed. See my previous answers on various topics using DOM. To save the file to your server after your processed it, you use
$dom->save('filename.xml');
Loading the file with $dom->load() will only work if you have allow_url_fopen enabled in your php.ini. If not, you have to use cURL to download the remote file first.
Maybe this should be helpfull to you: http://www.php.net/manual/en/function.simplexml-load-file.php
If you're have dificulte to get the XML file from the remote host you can use combine with above simplexml-load-string
$path_to_xml = 'http://some.com/file.xml';
$xml = simplexml_load_string( file_get_content($path_to_xml) );

Parsing XML file through PHP while using XSLT as the main template file

I have a lots (500ish) xml files from An old ASP and VBscript that was running on an old windows server. The user could click a link to download the requested xml file, or click a link to view how the xml file will look, once its imported into their system...
If clicked to view the output, this opened a popup window were the xml filename is passed via URL & using the xslt template file this would display the output.
example url = /transform.php?action=transform&xmlProtocol=AC_Audiology.xml
Now were using PHP5 im trying to get something that resembles the same output.
we started looking into xslt_create(); but this is an old function from php4
I'm looking for the best method to deploy this.
The main php page should check & capture the $_GET['xmlProtocol'] value.
pass this to the xslt template page as data;
were it will be output in html.
a general point in the right direction would be great!
You can find the documentation (+examples) of the "new" XSL(T) extension at http://docs.php.net/xsl.
php
// Transform.php
if(isset($_GET['action']) && $_GET['action'] == 'transform') {
// obviously you would never trust the input and would validate first
$xml_file = AFunctionValidateAndGetPathToFile($_GET['xmlProtocol']);
// Load up the XML File
$xmlDoc = new DOMDocument;
$xmlDoc->load($xml_file);
// Load up the XSL file
$xslDoc = new DomDocument;
$xslDoc->load("xsl_template_file.xsl");
$xsl = new XSLTProcessor;
$xsl->importStyleSheet($xslDoc);
// apply the transformation
echo $xsl->transformToXml($xmlDoc);
}
I had a similar problem about two years ago. I was using PHP5 but needed to use xslt_create(); or an equivalent. Ultimately, I switched to PHP4.
You can probably set your server to use PHP5 everywhere except for files in a certain folder. I believe that's what I did so I could process XSL files using PHP4 but the majority of the site still used PHP5.
It's possible that things have changed in the last two years and PHP5 has better support for something like xslt_create(); ---- I haven't been following recent changes.
Hope this helps!

Categories