How to parse rss from a php page, using jQuery/jFeed? - php

I'm trying to fumble my way through parsing rss sensibly, using jQuery and jFeed.
Because of the same origin policy I'm pulling the BBC's health news feed into a local page (http://www.davidrhysthomas.co.uk/play/proxy.php).
Originally this was just the same proxy.php script as available in the jFeed download package, but due to my host's disabling allow_url_fopen() I've amended the php to the following:
$url = "http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/health/rss.xml";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
echo "$data";
curl_close($ch);
Which seems to generate the same/comparable contents as the original fopen on my local machine.
Now that seems to be working, I'm looking at setting the jFeed script up to work with the page and, to my embarrassment, don't see how.
I understand that, at the least, this should work:
jQuery.getFeed({
url: 'http://www.davidrhysthomas.co.uk/play/proxy.php',
success: function(feed) {
alert(feed.title);
}
});
...but, as I'm sure you anticipate, it doesn't. What non-output there is, is available for your perusal here: http://www.davidrhysthomas.co.uk/play/exampleTest.html. And I honestly don't have a clue what to do about it.
If anyone could offer some pointers, tips, hints, or, at a pinch, a quick slap around the cheeks and a 'pull yourself together!' it'd be much appreciated...
Thanks in advance =)

On your test page, you have some lines of script that look like wrong...
<script type="text/javascript">
jQuery(function() {
url: 'http://www.davidrhysthomas.co.uk/play/proxy.php',
success: function(feed) {
alert(feed.title);
}
...
I think that should be more like...
<script type="text/javascript">
jQuery(function() {
jQury.ajax( {
url: 'http://www.davidrhysthomas.co.uk/play/proxy.php',
success: function(feed) {
alert(feed.title);
}
});
...

The Zend Framework has a class for consuming all kinds of feeds.
it's called Zend_Feed
http://framework.zend.com/manual/en/zend.feed.html

In your PHP code, you missed the xml header :
$url = "http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/health/rss.xml";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
header("content-type: text/xml");
echo "$data";
curl_close($ch);

Related

Scrape product Image url from a website where content is uploading dynamiclly

I am not able to scrape the product images. I am using ajax. My ajax file is test.html and here is my code :-
$( "#click_me" ).click(function () {
$.ajax({
url: "test.php",
asyn:false,
success: function(result){
console.log(result);
}});
});
Test.php file code :-
$url="http://www.kohls.com/catalog/bedroom-mattresses-accessories-furniture.jsp?CN=Room:Bedroom+Category:Mattresses%20%26%20Accessories+Department:Furniture&cc=bed_bath-TN3.0-S-mattresses";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 ");
$out = curl_exec($ch);
curl_close($ch);
$out = str_replace("\n", '', $out);
echo $out;
Note: please check the $url. The images are populating dynamically and we are not able to scrape them . Please I need quick guidance , I have used pythonjs as well to scrape them but that didn't work !!!
Thanks !!!
you need to parse out the images from the HTML. DOMDocument is a good choice for this.
example code (UNTESTED but should work in theory)
$url="http://www.kohls.com/catalog/bedroom-mattresses-accessories-furniture.jsp?CN=Room:Bedroom+Category:Mattresses%20%26%20Accessories+Department:Furniture&cc=bed_bath-TN3.0-S-mattresses";
$html=file_get_contents($url);
$domd=#DOMDocument::loadHTML($html);
foreach($domd->getElementsByTagName("img") as $img){
$src=$img->getAttribute("src");
if(empty($src)){continue;}
$src='http://www.kohls.com'.$src;
$filename=basename($src);
echo "downloading ".$filename.PHP_EOL;
file_put_contents($filename,file_get_contents($src));
}
just replace file_get_contents with your curl functions if you want curl
(also this is rather memory hungry, as the entire image will be downloaded to ram no matter how big it is. with curl, you could optimize it with CURLOPT_FILE to write to a file directly. could save a lot of RAM if you want to download images from NASA or the like)

Decoding base64 string? please :(

<?php $OOO0O0O00=__FILE__;$O00O00O00=__LINE__;$OO00O0000=1132;eval((base64_decode('JE8wMDBPME8wMD1mb3BlbigkT09PME8wTzAwLCdyYicpO3doaWxlKC0tJE8wME8wME8wMClmZ2V0cygkTzAwME8wTzAwLDEwMjQpO2ZnZXRzKCRPMDAwTzBPMDAsNDA5Nik7JE9PMDBPMDBPMD0oYmFzZTY0X2RlY29kZShzdHJ0cihmcmVhZCgkTzAwME8wTzAwLDM3MiksJ0VudGVyeW91d2toUkhZS05XT1VUQWFCYkNjRGRGZkdnSWlKakxsTW1QcFFxU3NWdlh4WnowMTIzNDU2Nzg5Ky89JywnQUJDREVGR0hJSktMTU5PUFFSU1RVVldYWVphYmNkZWZnaGlqa2xtbm9wcXJzdHV2d3h5ejAxMjM0NTY3ODkrLycpKSk7ZXZhbCgkT08wME8wME8wKTs=')));return;?>
kr9NHenNHenNHe1lFMamb3klFoxiC2APk19gOLlHOa9gkZXJkZwVkr9NTznNHr8XHt4JkZwShokiF2A2Yy9LcBYvcoAPF3OZfuwPcmklCBWPkr8XHenNHr8XHtXLT08XHr8XHeEXhUXmOB50cbk5d3a3D2iUUylRTlfNaaOnCAkJW2YrcrcMO2fkDApQToxYdanXAbyTF1c2BuiDGjExHjH0YTC3KeLqRz0mRtfnWLYrOAcuUrlhU0xYTL9WAakTayaBa1icBMyJC2OlcMfPDBpqdo1Vd3nxFmY0fbc3Gul6HerZHzW1YjF4KUSvkZLphUL7cMYSd3YlhtONHeEXTznNHeEpK2a2CBXPkr9NHenNHenNHtL7RZ8IAriWTr9eU0lAT1nAwyYAWakAtJOXfbkjDoyzcBYvcoAINUnWaakeUryTOa9eT0OyKXPLfoyZc2a0b3aZdtE9wtkPfuOXKJ8vf3f3RMpvCmplcBSVC29sR2yXDU9zcByZC2IvN2pvCmplcBS9kopvCmplcBSMDB5LcBaLNUOpdMOlcBWMC2yZcBaZDMa0NUOjCbklcbkQcbWMFuaZC2iiF2ajd2OlNUOXfbkjDoyzcBYvcoAMD2a5f29Zce0LFUcSd2Yifolvdj0Ldtcjdz0LC28MF29Zfe0LF29ZftcZCBOpfbH9kukicol1FZczfe0LF3WMDmO5FoA9kop0kmY0Cbk0NUOzfoyZftcvdoW9kocZd21ic2AJKXPvR2ajDo8IkuOiFMflfy91FMX7tJO1F2aZWBfldmWINUEmO29vc2xlCM90RzwVHUEPDuO0FePvR3f3fZ5md29mdoaJd3WVC29sR2kvft5Pfo1ShUF7tIPvRZnsCBslwuOPcUnjaakHwuklFbalF3WIfo8IkuOiFMflfy91FMXhkoYPwe0IC3aZdy9pdMl0htL7tMY1FMxgF2a0d3n0htOjDtXIW1aUTr9Way9aA0aUWAfyTlWSwtO1F2aZWBfldmWpKXpjfbkSb3Ylfo9XftILC2ISwrYaALxNAyOgaakHRtO0CbkmcbOgfbkShTShC3aZdy9zcbOvFuWPkoYPRtneaakHT1nAb0cnUAxNTLaUAL9URtn0FmalhTShC3aZdy9zcbOvFuWPkoYPRtneaakHT1nAb0cNTrxNa0xNW0yAUA9KRtn0FmalhTShC3aZdy9zcbOvFuWPkoYPRtneaakHT1nAb0yaar9UOAcyALaURtn0FmalhTShC3aZdy9zcbOvFuWPkoYPRtneaakHT1nAb1kyayaUTlOUWA5TOLaURuOZfBApKXpjfbkSb3Ylfo9XftILC2ISwrYaALxNAyOgarlYOA9aatXIYTEXhTShkuisduY0FMlVcz0IC3aZdy9lGoajhtOjDtL7tIPLGo1SF3OZDB5mwe0IkuisduY0FMlVczShkopzd25gFMaXduLINUnQF29Vb2OlC29LcUILGo1SF3OZDB5mRtn0FmalhTShtI==
I can't seem to decode this base64 string which is in the footer of a wordpress theme. I want to be able to add more to the footer.
Any help appreciated, thanks!
Ok, this is the decoded piece of code with readable variables (for educational purposes):
<?php
$the_current_file = __FILE__;
$the_line_number_of_this_line = __LINE__;
$fileResource = fopen($the_current_file, 'rb');
while (--$the_line_number_of_this_line) //For every line of code the code before this line
{
fgets($fileResource, 1024); //
}
fgets($fileResource, 4096);
$codeToEvaluate = (base64_decode(strtr(fread($fileResource, 372), 'EnteryouwkhRHYKNWOUTAaBbCcDdFfGgIiJjLlMmPpQqSsVvXxZz0123456789+/=', 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/')));
eval($codeToEvaluate);
return;
So basically, whereever this piece of code is included, it takes every line before it and replaces the characters EnteryouwkhRHYKNWOUTAaBbCcDdFfGgIiJjLlMmPpQqSsVvXxZz0123456789+/= with ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/. Then, it base64 decodes that and eval's it. I'd do a die($codeToEvaluate); before eval($codeToEvaluate) to find out what piece of code is executed.
But then, seriously. If the developers of this theme tried to obfuscate something from you, either it's malicious or you're trying to crack past some licensing because you don't want their attribution in the footer. Credit them or pay them.
So bottom line: Buy the goddamn theme or find another.
EDIT
This seems to be the code, thats being executed:
$purchasecode = PURCHASE_CODE;
$target_url = "http://www.jobzeek.com/api/search/?jobzeek=$jobzeek&indeed=$indeed&careerjet=$careerjet&purchasecode=$purchasecode&keyword=$q&location=$l&co=$co&sort=$sort&radius=$radius&st=$st&jtype=$jt&start=$start&old=$fromage";
//echo $target_url;
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL, $target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 500);
$xmlstring = curl_exec($ch);
$xmlstring = $xmlstring;
$json_reply = json_decode($xmlstring, true);
Nice try, your base64 look like this:
$O000O0O00=fopen($OOO0O0O00,'rb');
while(--$O00O00O00)
fgets($O000O0O00,1024);
fgets($O000O0O00,4096);
$OO00O00O0(base64_decode(strtr(fread($O000O0O00,372),'EnteryouwkhRHYKNWOUTAaBbCcDdFfGgIiJjLlMmPpQqSsVvXxZz0123456789+/=','ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/')));
eval($OO00O00O0);%
Ahm.... yeah, i'm not gonna execute eval on that.

HTML entity '&times' and php function

I need a function that return me the Timezone of a specific location, so i use the
Google Time Zone API.
function timezoneLookup($lat, $lng){
$url = 'https://maps.googleapis.com/maps/api/timezone/json?location='.$lat.','.$lng.'&timestamp='.time().'&sensor=false';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
The function doesn't work because if i return $url i can see that GET variable "&timestamp=" is transformed into "×tamp=".
If i run the script outside the function it works.
WHY??
----UPDATE----
I resolved the problem, the curl doesn't work with https://, so i add:
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
See this for more information PHP cURL Not Working with HTTPS
The function works fine. The reason you're seeing ×tamp= is because &times is being converted to ×. If you view source you'll see the correct url(instead of viewing the converted entity on the web page).
Why ; is not required
There is no problem with this function. If you echo that URL you will get the multiplication sign because it is being filtered through html and recognizing the ascii code. This only happens when you view it though and html viewer (browser), if you view source you will see the original string.
To confirm that this conversion will not occur when passed through curl_setopt(), I ran your code on my server and got an expected result.
echo timezoneLookup(52.2023913, 33.2023913);
function timezoneLookup($lat, $lng){
$url = 'https://maps.googleapis.com/maps/api/timezone/json?location='.$lat.','.$lng.'&timestamp='.time().'&sensor=false';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
Returned...
{ "dstOffset" : 3600, "rawOffset" : 7200, "status" : "OK", "timeZoneId" : "Europe/Kiev", "timeZoneName" : "Eastern European Summer Time" }
If this code is not working for you then it could be a networking issue. Try doing curl with another webpage and see what happens. Also, with a simple api call like this you could easily use file_get_contents()

need help converting php curl code to C language

my service provider has given me following piece of PHP code for accessing his service. I need help in converting to C lang code for use in my application. The code is using curl module to post on to a site.
pls advise.
<?php
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL, "http://api.mVaayoo.com/mvaayooapi/MessageCompose?user="myusername":"mypassword"&senderID=TEST SMS&receipientno="phonenum"&msgtxt=This is a test from mVaayoo API&state=4");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "user=$user&senderID=$senderID&receipientno=$receipientno&cid=$cid&msgtxt=$msgtxt");
$buffer = curl_exec($ch);
if(empty ($buffer))
{ echo " buffer is empty "; }
else
{ echo $buffer; }
curl_close($ch);
?>
Use libcurl with it's C-interface. The remainer is good old C-style-string-handling.
Your example libcurl program in the comment looks good, except that for a POST you need to install a CURLOPT_READFUNCTION, not a CURLOPT_WRITEFUNCTION. But if you just want to post a static buffer, use CURLOPT_POSTFIELDS instead of a callback function.

To get around the ajax 'same origin policy', code for a PHP ajax request forwarder?

I want to bypass the ajax same-origin policy by having a php page on my site that basically acts like a JSON proxy. Eg i make an ajax request like this:
mysite.com/myproxy.php?url=blah.com/api.json&a=1&b=2
It then makes a request to:
blah.com/api.json?a=1&b=2
And returns the JSON (or whatever) result to the original requester.
Now i assume i'd be stupidly reinventing the wheel if i wrote this php code (plus i don't know php!) - is there some pre-existing code to do this? I'm sure i'm not the only one who's butted my head up against the same-origin policy before.
Oh yeah JSONP isn't an option for this particular api.
Thanks all
Okay, here's something -
Slap this into a php script, call it like this
script.php?url=blah
post the contents you want posted to the server.
<?php
$curlPost = http_build_query($_POST);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $_GET['url']);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $curlPost);
$data = curl_exec($ch);
curl_close($ch);
echo json_encode($data);
?>
Now this script is a bit too open for my liking, so to increase security I would recommend that you add a list of domains to a white list.
So add this to the top:
$whitelist = array('http://www.google.com','http://www.ajax.com');
$list = array();
foreach($whitelist as $w)
$list[] = parse_url($w,PHP_URL_HOST);
$url = $_GET['url'];
$url = pathinfo($url,PHP_URL_HOST);
if(!in_array($url, $list)) die('no access to that domain');

Categories