Sanitize and secure XML import - php

I am fairly new to XML, and have started using simplexml_load_file to import content of an XML-file.
I have the following working code, but I know that it is potentially dangerous.
I need help with securing the code, content and URL, and also the opportunity to limit characters in $doc->content
<ul id="feed">
<?php
ob_start();
$xml = simplexml_load_file('wip4/xmlfeed.epl');
foreach ($xml->document as $doc)
{
if($num++ < 10) {
echo '<li class="jobb-entry"><h4>'. $doc->title . '</h4>';
echo '<p>'. $doc->content . '</p>';
echo '<p class="apply-link clearfix"><span>Apply</span></p></li>';
}
}
ob_end_flush();
?>
</ul>
Also, if there is other methods of importing XML-documents, that are both faster and more secure, I appreciate any tips.

Related

Instagram URL --> JSON

How can I go about convering a public instagram URL into JSON using PHP? Ex: https://www.instagram.com/explore/tags/brindle/
I can't use the API as I need public hashtag content and my use case won't qualify for their app review process :-(.
Here is what I have so far but it does not pull all images. Also, I'd like to be able to load the "load more" images as well. Any help would be much appreciated!
$instagram_source = file_get_contents("https://www.instagram.com/explore/tags/brindle/");
$instagram_data = explode("window._sharedData = ", $instagram_source);
$instagram_json = explode(';</script>', $instagram_data[1]);
$instagram_array = json_decode($instagram_json[0], TRUE);
$instagram_media = $instagram_array['entry_data']['TagPage'][0]['tag']['media']['nodes'];
if(!empty($instagram_media)) {
echo '<ul>';
foreach($instagram_media as $im) {
echo '<li>';
echo '<a href="https://www.instagram.com/p/'.$im['code'].'/" target="_blank">';
echo '<img src="'.$im["display_src"].'" alt="" width="'.$im["dimensions"]["width"].'" height="'.$im["dimensions"]["height"].'" />';
echo '</a>';
echo '</li>';
}
echo '</ul>';
}
Take a look at this solution here: https://github.com/Bolandish/Instagram-Grabber
Thats the best one i know until now.

Scraping data from amazon

I'm aware that there is an amazon API for pulling their data but I'm just trying to learn to scrape for my own knowledge and pulling data from amazon seems like a good test.
<?php
ini_set('display_errors',1);
ini_set('display_startup_errors',1);
error_reporting(-1);
include('../includes/simple_html_dom.php');
$html = file_get_html('http://www.amazon.co.uk/gp/product/B00AZYBFGY/ref=s9_simh_gw_p86_d0_i1?pf_rd_m=A3P5ROKL5A1OLE&pf_rd_s=center-2&pf_rd_r=1MP0FXRF8V70NWAN3ZWW&pf_r$')
foreach($html->find('a-section') as $element) {
echo $element->plaintext . '<br />';
}
echo $ret;
?>
All I'm trying to do is pull the product description from the link but I'm not sure why it's working. I'm not getting any errors or any data at all, really.
The class for the Product Description is simply productDescriptionWrapper so in your sample code use that css selector
foreach($html->find('.productDescriptionWrapper') as $element) {
echo $element->plaintext . '<br />';
}
simplehtmldom uses css selectors very similar to jQuery. so if you want all divs say ->find('div') if you want all anchors with a class of 'hotProduct' say ->find('a.hotProduct') so on and so forth
It doesn't work because the product description is being added by JavaScript into an iFrame.
You first can check if there is an HTML taken from the Amazon. It might block your request.
$url = "https://www.amazon.co.uk/gp/product/B00AZYBFGY/ref=s9_simh_gw_p86_d0_i1?pf_rd_m=A3P5ROKL5A1OLE&pf_rd_s=center-2&pf_rd_r=1MP0FXRF8V70NWAN3ZWW&pf_r$"
$htmlContent = file_get_contents($url);
echo $htmlContent;
$html = str_get_html($htmlContent);
Note, the https://, you have http://, maybe that is why you get nothing.
Once you get HTML, you can go forward.
Try different selectors:
foreach($html->find('div[id=productDescription]')) as $element) {
echo $element->plaintext . '<br />';
}
foreach($html->find('div[id=content]')) as $element) {
echo $element->plaintext . '<br />';
}
foreach($html->find('div[id=feature-bullets]')) as $element) {
echo $element->plaintext . '<br />';
}
It should display the page itself, maybe with some missing CSS.
If the HTML is in place. You can try those xpaths

PHP XML feed assembler works for every feed except Twitter

I'm trying to retrieve feeds from a twitter search to display on various parts of the site. I modified this function to do this yet it works for every feed a try it on except twitter.
function getFeed($feed_url) {
$content = file_get_contents($feed_url);
$x = new SimpleXmlElement($content);
echo "<ul>";
foreach($x->channel->item as $entry) {
echo "
<li>
<a href='$entry->link' title='$entry->title'>" . $entry->title . "</a>
</li>";
}
echo "</ul>";
}
I am after only the content of the various associative posts. Here is a sample call.
<?php getFeed("feed://search.twitter.com/search.atom?q=berkshire+golf"); ?>
Any ideas,
Marvellous
Twitter's search API is just a simple HTTP request, so your URL should be http:// and not feed:// if you're using file_get_contents
Edit:
You're also using .atom, use rss instead:
http://search.twitter.com/search.rss?q=berkshire+golf

Is there something wrong with this XML/PHP Code for Tumblr to Website?

I am trying to link a tumblr feed to a website. I found this code (As you can see, something must be broken with it as it doesnt even format correctly in this post):
<?php
$request_url = “http://thewalkingtree.tumblr.com/api/read?type=post&start=0&num=1”;
$xml = simplexml_load_file($request_url);
$title = $xml->posts->post->{‘regular-title’};
$post = $xml->posts->post->{‘regular-body’};
$link = $xml->posts->post[‘url’];
$small_post = substr($post,0,320);
echo ‘<h1>’.$title.’</h1>’;
echo ‘<p>’.$small_post.’</p>’;
echo “…”;
echo “</br><a target=frame2 href=’”.$link.”’>Read More</a>”;
?>
And i inserted the tumblr link that I will be using. When I try to preview my HTML, i get a bunch of messed up code that reads as follows:
posts->post->{'regular-title'}; $post = $xml->posts->post->{'regular-body'}; $link = $xml->posts->post['url']; $small_post = substr($post,0,320); echo '
'.$title.'
'; echo '
'.$small_post.'
'; echo "…"; echo "Read More"; ?>
Any help would be appreciated. Thank you!
That is PHP, not HTML. You need to process it with a PHP parser before delivering it to a web browser.
… it should also be rewritten so it can cache the remote data, and escape special characters before injecting the data into an HTML document.

Parsing HTML to return CSS rules from ids and classes attributes with PHP

I hate to have to write down a lot of CSS rules and then enter my styles in it, so I'd like to develop a tiny php script that would parse the HTML I'd pass to it and then return empty CSS rules.
I decided to use PHP's DomDocument.
The question is: How could I loop through the whole structure? (I saw that for example DomDocument only has getElementByTag or getElementById and no getFirstElement for example)
I only want to get the ids and the classes in a given block of HTML code, I'd pass things like:
<div id="testId">
<div class="testClass">
<span class="message error">hello world</span>
</div>
</div>
I only want to know how could I loop through every node?
Thanks!
You can pass an asterisk (*) to getElementsByTagName to get all tags and then loop through them...
<?php
$nodes = $xml->getElementsByTagName("*");
$css = "";
for ($i = 0; $i < $nodes->length; $i ++)
{
$node = $nodes->item($i);
if ($node->hasAttribute("class")) {
$css = $css . "." . $node->getAttribute("class") . " { }\n";
} elseif ($node->hasAttribute("id")) {
$css = $css . "#" . $node->getAttribute("id") . " { }\n";
}
}
echo $css;
?>
The SimpleXML extension for PHP may help you. It work perfectly to navigate through HTML tree.
http://www.php.net/manual/en/simplexml.examples-basic.php

Categories