Creating DOM XML - trying to resolve the 'nbsp' entity problem - php

I'm trying to create XML record. I started with this:
$doc = new DomDocument('1.0', 'UTF-8');
Now I should add this line:
<!DOCTYPE page SYSTEM "http://gv.ca/dtd/character-entities.dtd">
How do I do this?
Chris

Assuming it's PHP, you could consider using DOMImplementation::createDocumentType and DOMImplementation::createDocument. See here

Related

error rendering a sitemap with laravel

I have a problem rendering a sitemap with laravel.
Generated xml seems ok but when i try to call the url from chrome or firefox i got an error
error on line 2 at column 6: XML declaration allowed only at the start of the document
In fact line 1 of the document is empty and xml declaration starts on line 2
Here is my code :
return Response::view('sitemap.index', ['agences' => $agences])->header('Content-Type', 'application/xml');
i tried that syntax too :
$xml = View::make('sitemap.index', ['agences' => $agences]);
return Response::make($xml, 200)->header('Content-Type', 'application/xml');
That way i could do
dd($xml->render());
and realize the string returned has no empty first line.
So i'm guessing Response::make is the one to blame but i really have no idea where to look from there
Ok I'm gonna post my own answer cause that was tricky and it costs me a day, the good thing is my knowledge of laravel has slightly increased.
So i had my xml sitemap beginning with an empty line, and that created an error on browser.
Xml was first generated using a blade template.
As it didn't work i decided to use RoumenDamianoff/laravel-sitemap
But i had the same problem. So finally i decided to generate Xml myself again using SimpleXmlElement and it changes nothing.
At that point i begun to dig in Laravel internal's to see where that empty line could come from in the request lifecycle.
Basically my sitemap is very simple :
$urlset = new SimpleXMLElement('<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" /><!--?xml version="1.0" encoding="UTF-8"?-->');
datas = MyModel::All();
foreach($datas as $index=>$data){
// generate sitemap
}
$dom = new DomDocument();
$dom->loadXML($urlset->asXML());
$dom->formatOutput = true;
//output xml
$xml = $dom->saveXML();
$response = Response::make($xml, 200, ['Content-Type' => 'application/xml']);
Just to test i decided to change the model i was requesting, and then my xml generated without that first empty line.
So i decided to investigate the model itself and find the error. The model file just had an empty line before php opening tag.
Deleting that empty line has solved my problem ....

php: Getting the contents of a feedburner feed

I wrote this function to parse through html source code, but for some reason it does not work for feedburner feeds. Any ideas?
$dom = new DOMDocument();
$dom->loadHTMLFile('http://www.killington.com/winter/mountain/conditions');
$xml = simplexml_import_dom($dom);
$snow = $xml->xpath('//td');
What I really need to do is simply get the data from the page.
Not sure what the problem is other than the fact that this isnt a feed its a webpage. That said since youre using dom document theres no reason to bother with simplexml and that may be where the problem is coming in...
$dom = new DOMDocument();
$dom->loadHTMLFile('http://www.killington.com/winter/mountain/conditions');
$xpath = new DOMXPath($dom);
$snow = $xpath->query('//td');
First of all, you must open the feed page (the xml one, for example) and check which kind of feed it is:
<rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
Then, you take a look at something like this good tutorial: http://net.tutsplus.com/articles/news/how-to-read-an-rss-feed-with-php-screencast/ and you're almost done :)

htmlpurifier, overpurification of third party source

UPDATE 2: http://htmlpurifier.org/phorum/read.php?3,5088,5113 Author has already identified the problem.
UPDATE: Issue appears to be exclusive to version 4.2.0. I have downgraded to 4.1.0 and it works. Thank you for all your help. Author of package notified.
I am scraping some pages like:
http://form.horseracing.betfair.com/horse-racing/010108/Catterick_Bridge-GB-Cat/1215
According to W3C validation it is valid XHTML Strict.
I am then using http://htmlpurifier.org/ to purify the HTML before loading into a DOMDocument. However it is only returning a single line of content.
Output:
12:15 Catterick Bridge - Tuesday 1st January 2008 - Timeform | Betfair
Code:
echo $content; # all good
$purifier = new \HTMLPurifier();
$content = $purifier->purify($content);
echo $content; # all bad
BTW it works for data sourced from another site, just as you say leaves the title for all pages from this domain.
Related Links
HTMLPurifier dies when the following code is run through it (unanswered question on similar topic)
You should not need the HTML purifier. The DOMDocument class will take care of everything for you. However, it will trigger a warning on invalid html, so just do this:
$doc = new DOMDocument();
#$doc->loadHTML($content);
Then the error will not be triggered, and you can do what you wish with the HTML.
If you are scraping links, I would recommend that you use SimpleXMLElement::xpath(); That is much easier than working with the DOMDocument. Another example on that:
$xml = new SimpleXMLElement($content);
$result = $xml->xpath('a/#href');
print_r($result);
You can get much more complex xpaths that allow you to specifiy class names, ids, and other attributes. This is much more powerful than DOMDocument.

parsing xml document using cURL

I am trying to parse an xml document I created in a php file and outputted using
echo $xmlMysql->saveXML();
using cURL I send the information over, but when I try and parse it through using the following code.
$xmlDoc = download_page($url);
$dom = new DomDocument();
$dom->load($xmlDoc);
echo $dom->saveXML();
I get this error message,
<b>Warning</b>: I/O warning : failed to load external entity
^
any help with this would be much appreciated
if $xmlDoc is a string of XML that you're getting from an HTTP request, try using the loadXML method instead of just load method of your DomDocument object.
You can do
$dom = new DomDocument()
$dom->resolveExternals = false;
//...
to prevent external entities from being resolved. Of course, you may want to investigate which external entities are not being read. See also libxml_disable_entity_loader.
Try the following code:
$dom = dom_import_simplexml(simplexml_load_string($response))->ownerDocument;
$dom->formatOutput = true;
echo '<PRE style="color:#000066;padding:10px;text-align:left">',htmlspecialchars($dom->saveXML()),'</PRE>';

working with XML in PHP

I have an url return an XML page result. When I use this command:
print_r(file($url));
Its done, but when I use command:
$doc = load($url);
after that I :
print_r($doc);
it out. Its print_r out nothing. I'm quite new in work with XML in PHP someone give advise, please!
Thank you for your attention!
I am not really sure what you trying to do but for parsing an xml file in PHP there two main ways: DOM
$doc = new DOMDocument();
$doc->loadXML(file_get_contents($url));
SimpleXML
$xml = new SimpleXMLElement(file_get_contents($xmlstr));
file_get_contents Reads entire file into a string
#deceze and RageZ:
I'm using load() to get its attribute like this
$url = 'web address return an XML result';
$xml = load($url);
$node1 = $xml->getElmentsByTagName('tagname');
$value = $node1->getAttribute('attribute1');
But I have an error $xml is not an object and I check out by print_r and I get nothing but with print_r(file($url)) its print out an array as I expect!
#Franz: May be I get an error tag in XML file but I could not fixed this just work with the result!
You could also unserialize the xml into a php array and use print_r(array). Take a look here: http://articles.sitepoint.com/article/xml-php-pear-xml_serializer/3#
You will need a PEAR package for this

Categories