Get HTML of tumblr post via curl - php

I'm currently struggeling to get the page content of a tumblr post. Thats how my script looks like:
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_ENCODING ,"");
curl_setopt($c, CURLOPT_FRESH_CONNECT, true);
curl_setopt($c, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)');
curl_setopt($c, CURLOPT_REFERER, $url);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($c, CURLOPT_MAXREDIRS, 2);
curl_setopt($c, CURLOPT_HEADER, true);
$html = curl_exec($c);
print_r(curl_getinfo($c));
curl_close($c);
There is no problem fetching a whole "tumblr", for example http://anyuser.tumblr.com.
But when I try to get something like http://anyuser.tumblr.com/post/1234567890/my-post, the server responds with 400 (bad request). Where is the problem? I found several solutions for posting data on tumblr via curl, but it seems like nobody had this kind of problem before.

Related

Getting string instead of xml response in curl

I have the following curl request
$url='http://test/paynetz/epi/fts?login=160&pass=Test#123&ttype=NBFundTransfer&prodid=NSE&amt=50&txncurr=INR&txnscamt=0&clientcode=TkFWSU4%3d&txnid='.urlencode($string).'&date='.urlencode($date).'&custacc=1234567890&udf1=ajeesh&udf2=sam#zz.com&udf3=940000000&udf4=arrackaparmabilhouse&ru=http://www.zwitch.co';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, false);
echo $auth = curl_exec($curl);
Im getting this
http://test/paynetz/epi/ftsNBFundTransfer267050dHwIMJR%2FucGOZcnocTnwvISAVaeNZK93Y8veI%2Bb1DtY%3D11
Instead of an xml.Im getting the values only not the xml.
I had 505 error inthe response,so I used urlencode($string) instead of $string
Have you tried adding curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'); This will confirm that you are a human rather than a bot.
If you are trying to output the XML directly onto a web page, you'll might want to lookup htmlentities().

cURL is retrieving encoded HTML from Pirate Bay

I'm creating a script that is scraping the site www.piratebay.se. The script was working OK two-three days ago but now I'm having problems with it.
This is my code:
$URL = 'http://thepiratebay.se';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $URL);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1");
curl_setopt($ch, CURLOPT_COOKIE, "language=pt_BR; c[thepiratebay.se][/][language]=pt_BR");
$fonte = curl_exec ($ch);
curl_close ($ch);
echo $fonte;
The response of this code is not clean HTML, but looks like this instead:
��[s۸N>��k�9��-ىmI7��$�8�.v��͕���$h���y�G�Sg:ӷ>�5����ʱ�aor&���.v)���������) d�w��8w�l����c�u""1����F*G��ِ�2$�6�C�}��z(bw�� 4Ƒz6�S��t4�K��x�6u���~�T���ACJb��T^3�USPI:Mf��n�'��4��� ��XE�QQ&�c5�`'β�T Y]D�Q�nBfS�}a�%� ���R) �Zn��̙ ��8IB�a����L�
I already tried to use user agent on .htaccess, PHP and cURL but to no success.
Add this:
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
Tested on my local environment, works fine with it.

Can't get url using JSON script parsed with file_get_contents

I have this link I want to parse some information in it or just save it in a file...
can't do it without this simple code:
Example:
<?php
$myFile = 'test.txt';
$get= file_get_contents("http://www.ticketmaster.com/json/resale?command=get_resale_listings&event_id=0C004B290BF2D95F");
file_put_contents($myFile, $get); ?>
The output is:
{"version":1.1,"error":{"invalid":{"cookies":true}},"command":"get_resale_listings"}
I tried many other things like fopen or include did not work either. I don't understand because when I put the url in the browser it shows exactly ALL the code (google chrome) OR even better ask me to save it as a file (explorer). Looks like a browser cookies or something that doesn't load on my localhost ??
thanks for your tips.
You need to access that url with CURL.
The server checks if the client has cookies enabled. Using file_get_content() You do not send any information about client (browser).
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.ticketmaster.com/json/resale?command=get_resale_listings&event_id=0C004B290BF2D95F');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, "my_cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "my_cookies.txt");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_exec($ch);

CURL: PHP : can't submit

I need to write a php script that will login to my admin page then submit rss.
I'm able to login with the code below, but can't submit the rss
<?php
function rssadd($url,$post,$post2) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.0.4) Gecko/2008102920 AdCentriaIM/1.7 Firefox/3.0.4');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
$ch2 = curl_init($url);
curl_setopt($ch2, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.0.4) Gecko/2008102920 AdCentriaIM/1.7 Firefox/3.0.4');
curl_setopt($ch2, CURLOPT_POST, 1);
curl_setopt($ch2, CURLOPT_POSTFIELDS, $post2);
curl_setopt($ch2, CURLOPT_REFERER, $url);
curl_setopt($ch2, CURLOPT_RETURNTRANSFER, 1);
$result2 = curl_exec($ch2);
return $result . $result2;
}
$page2 = rssadd('http://site.com/admin.php?mod=rss&action=news&id=4','subaction=dologin&username=admin&password=pass','subaction=doit');
echo $page2;
?>
the html on "http://site.com/admin.php?mod=rss&action=news&id=4" i'm not able to submit
<input type="submit" name="subaction" value="doit" class="buttons">
Perhaps you need to retain cookies between requests? Try setting CURLOPT_COOKIEFILE to '' (the empty string). You'll also need to use the same curl handle on both requests - instead of closing the first handle and initializing a new one, just change the options on the first one and run it again.
I originally learned how to do this from this Stack Overflow answer: https://stackoverflow.com/a/5758471/638544

Can't get user_id who share post in facebook using FQL Query or CURL

I try using FQL Query to fetch count of share from post on facebook page but I don't know table who provide this. So my final decision is using CURL to get data.
I'm using CURL and get 2 persons who share that post actually the data is 27 persons. Ajax function on page it's not reload data again and result keep 2 persons.
If Ajax Function on page works correctly, I can fetch all users who share post.
Another method using FQL Query or Facebook SDK really appreciate..
Could I fetch all users who share post on facebook ?
Thanks in Advance.
$login_email = 'myemail';
$login_pass = 'mypassword';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://login.facebook.com/login.php?m&next=http%3A%2F%2Fm.facebook.com%2Fhome.php');
curl_setopt($ch, CURLOPT_POSTFIELDS,'email='.urlencode($login_email).'&pass='.urlencode($login_pass).'&login=Login');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, "my_cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "my_cookies.txt");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_exec($ch);
$url= 'http://www.facebook.com/shares/view?id=390010811012261';
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 300);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1');
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.facebook.com');
curl_setopt($ch, CURLOPT_URL, $url);
$curl_dt= curl_exec($ch);
Scraping Facebook pages like this is a violation of it's site policies (3.2 of the SRR)
Please file a feature request in the Bug tracker to get the full list of Shares in the API for Post objects (currently we only serve the number of shares, not the full info about them)

Categories