PHP Twitter script doesn't always work - php

I have a PHP script on the site that pulls a twitter feed and displays it. Strangely most of the time it seems to work just fine, but sometimes (quite a lot actually) it doesn't work at all and just displays the follow button.
The code is as follows, obviously USERNAME has the actual twitter account username in:
$widget = true;
$twitterid = "#USERNAME";
$doc = new DOMDocument();
# load the RSS document, edit this line to include your username or user id
if($doc->load('http://twitter.com/statuses/user_timeline/USERNAME.rss')) {
# specify the number of tweets to display, max is 20
$max_tweets = 4;
$i = 1;
foreach ($doc->getElementsByTagName('item') as $node) {
# fetch the title from the RSS feed.
# Note: 'pubDate' and 'link' are also useful (I use them in the sidebar of this blog)
$tweet = $node->getElementsByTagName('title')->item(0)->nodeValue;
# the title of each tweet starts with "username: " which I want to remove
$tweet = substr($tweet, stripos($tweet, ':') + 1);
# OPTIONAL: turn URLs into links
$tweet = preg_replace('#(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)#', '$1', $tweet);
# OPTIONAL: turn #replies into links
$tweet = preg_replace("/#([0-9a-zA-Z]+)/", "#$1", $tweet);
echo "<p> <p>".$tweet."</p></p><hr />\n";
if ($i++ >= $max_tweets)
break;
}
echo "</ul>\n";
}
// Here's the Twitter Follow Button Widget
if($widget){
echo "Follow #" .$twitterid. "<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=\"//platform.twitter.com/widgets.js\";fjs.parentNode.insertBefore(js,fjs);}}(document,\"script\",\"twitter-wjs\");</script>";
}

Sadly Twitter has removed the URL https://twitter.com/statuses/user_timeline/USERNAME.rss and it now returns Sorry, that page does not exist as of Oct 12 2012. There is a json equivalent however this may fail as well after March 2013. Try https://api.twitter.com/1/statuses/user_timeline.json?screen_name=USERNAME&count=4 for the time being.
HTH

Twitter enforces rate limiting on unauthenticated calls (calls made to the API that haven't been authenticated using OAuth).
"Unauthenticated calls are permitted 150 requests per hour. Unauthenticated calls are measured against the public facing IP of the server or device making the request."
If you are using shared hosting, it makes it more likely for you to get rate-limited as someone else using the same IP on the host could also be querying the Twitter API (hence, counting towards the hourly limit for that IP).
You can read more on these restrictions on Twitter's Rate Limiting restrictions website as well as on the Rate Limiting FAQ website.

<?php
$timeline="http://api.twitter.com/1/statuses/user_timeline.xml?screen_name=arvizard";
$xml= new SimpleXMLElement(file_get_contents($timeline));
$i=0;
print "<ul class=\"tweet_list\">";
foreach($xml ->children() as $tstatus)
{
$stat=$tstatus->text;
$split= preg_split('/\s/',$stat);
print "<li class=\"tweet\"><p class=\"tweet_text\">";
foreach ($split as $word)
{
if (preg_match('/^#/',$word)) {
print " "."".$word."";
}
else if (preg_match('/^http:\/\//',$word)){
print " "."".$word."";
}
else
{
print " ".$word;
}
}
print "</p>";
print "<span class=\"date\">".substr($tstatus->created_at,0,strlen($tstatus->created_at)-14)."</span>";
print "</li>";
$i++;
if ($i==5)
{
break;
}
}
print "</ul>";
?>
This may help you. please check it.

Related

get all <a> tags href in page with php

i am trying to get all external links in one web page and store it in database.
i put all web page contents in variable:
$pageContent = file_get_contents("http://sample-site.org");
how i can save all external links??
for example if web page has a code such as:
other site
i want to save http://othersite.com in database.
in the other words i want to make a crawler that store all external links exists in one web page.
how i can do this?
You could use PHP Simple HTML DOM Parser's find method:
require_once("simple_html_dom.php");
$pageContent = file_get_html("http://sample-site.org");
foreach ($pageContent->find("a") as $anchor)
echo $anchor->href . "<br>";
I would suggest using DOMDocument() and DOMXPath(). This allows the result to only contain external links as you've requested.
As a note. If you're going to crawl websites, you will more likely want to use cURL, but I will continue with file_get_contents() as that's what you're using in this example. cURL would allow you to do things like set a user agent, headers, store cookies, etc. and appear more like a real user. Some websites will attempt to prevent robots.
$html = file_get_contents("http://example.com");
$doc = new DOMDocument();
#$doc -> loadHTML($html);
$xp = new DOMXPath($doc);
// Only pull back A tags with an href attribute starting with "http".
$res = $xp -> query('//a[starts-with(#href, "http")]/#href');
if ($res -> length > 0)
{
foreach ($res as $node)
{
echo "External Link: " . $node -> nodeValue . "\n";
}
}
else
echo "There were no external links found.";
/*
* Output:
* External Link: http://www.iana.org/domains/example
*/

PHP Crawler not crawling all elements

so i'm trying to make a PHP crawler (for personal use).
What the code does is displaying "found" for each ebay auction item found that ends in less than 1 hour but there seems to be a problem. The crawler can't get all the span elements and the "remaining time" element is a .
the simple_html_dom.php is downloaded and not edited.
<?php include_once('simple_html_dom.php');
//url which i want to crawl -contains GET DATA-
$url = 'http://www.ebay.de/sch/Apple-Notebooks/111422/i.html?LH_Auction=1&Produktfamilie=MacBook%7CMacBook%2520Air%7CMacBook%2520Pro%7C%21&LH_ItemCondition=1000%7C1500%7C2500%7C3000&_dcat=111422&rt=nc&_mPrRngCbx=1&_udlo&_udhi=20';
$html = new simple_html_dom();
$html->load_file($url);
foreach($html->find('span') as $part){
echo $part;
//when i echo $part it does display many span elements but not the remaining time ones
$cur_class = $part->class;
//the class attribute of an auction item that ends in less than an hour is equal with "MINUTES timeMs alert60Red"
if($cur_class == 'MINUTES timeMs alert60Red'){
echo 'found';
}
}
?>
Any answers would be useful, thanks in advance
Looking at the fetched HTML it seems as if the class alert60Red is set through JavaScript. So you couldn't find it as JavaScript is never executed.
So just searching for MINUTES timeMs looks stable as well.
<?php
include_once('simple_html_dom.php');
$url = 'http://www.ebay.de/sch/Apple-Notebooks/111422/i.html?LH_Auction=1&Produktfamilie=MacBook%7CMacBook%2520Air%7CMacBook%2520Pro%7C%21&LH_ItemCondition=1000%7C1500%7C2500%7C3000&_dcat=111422&rt=nc&_mPrRngCbx=1&_udlo&_udhi=20';
$html = new simple_html_dom();
$html->load_file($url);
foreach ($html->find('span') as $part) {
$cur_class = $part->class;
if (strpos($cur_class, 'MINUTES timeMs') !== false) {
echo 'found';
}
}
If a snippet of code is included in another php file, or html is embedded in php, your browser cannot see it.
So no webcrawl api can detect it. I think your best bet is to find the location of simple_html_Dom.php and try crawl that file somehow. You may not even be able to get access to it. It's tricky.
You could also try find by Id if your api has that function?

How to get Google search results page into a DOMDocument object in PHP?

I'm not looking to scrape Google. This just a one-time thing to get about 300 urls a bit faster than manually doing it.
I can't seem to get a DOMDocument to be created though. It always ends up as an empty object.
search_list.txt contains my list of search terms. Right now I'm testing it with just 1 term, "legos".
The script correctly downloads the search results page. I viewed it in a web browser and it looked fine.
search_list.txt
legos
getresults.php
<?php
$search_list = 'search_list.txt'; // file containing search terms
$results = 'results.txt';
$handle = fopen($vendor_list,'r');
while($line = fgets($handle)) {
$fp = fopen($results,'w');
$ch = curl_init('http://www.google.com/'
. 'search?q=' . urlencode($line));
curl_setopt($ch,CURLOPT_FILE,$fp);
curl_setopt($ch,CURLOPT_HEADER,0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
unset($ch,$fp);
}
fclose($handle);
$dom = DOMDocument::loadHTML(file_get_contents($results));
echo print_r($dom,true); // EMPTY
$search_div = $dom->getElementById('search');
if(is_null($search_div)) { // ALWAYS NULL
echo 'Search_div is null';
} else {
echo print_r($search_div,true);
}
?>
I made some changes.
Instead of fopen - fgets - * , file .
Instead of curl, simple_html_dom::load_file
$search_list = 'search_list.txt'; // file containing search terms
$result_list = 'results.txt'; // file containing search terms
$searching_list = file($search_list);
foreach ($search_list as $key => $searching_word) {
$html->load_file('http://www.google.com/'.'search?q='.urlencode($searching_word));
$search_div = $html->find("div[id='search']");
echo $search_div[0]; // See content of the search div.
file_put_contents($result_list,$search_div[0]);
}
?>
You can see the results with echo $search_div[0];.
It shows you whole content of search div .
I searched for 'asd' =) ...
Based on my results , it is started with like
<div id="search"><div id="ires"><ol><li class="g"><h3 class="r"><b>Atrial septal defect</b> - Wikipedia, the free encyclopedia</h3><div class="s"><div class="kv" style="margin-bottom:2px"><cite>en.wikipedia.org/wiki/<b>Atrial_septal_defect</b></cite><span class="flc"> - Cached - Similar</span></div><span class="st"><b>Atrial septal defect</b> (<b>ASD</b>)
And ended like
</span><br></div></li><li class="g"><h3 class="r"><b>Achievement School District</b></h3><div class="s"><div class="kv" style="margin-bottom:2px"><cite>achievementschooldistrict.org/</cite><span class="flc"> - Cached</span></div><span class="st"><b>Achievement School District</b> · The <b>ASD</b> · Driving Results · Campuses · Join Our <br> Team · Enroll A Student · <b>ASD</b> News · Contact Us <b>...</b></span><br></div></li></ol></div></div>
UPDATE
This part is based on comment of Buttle Butk .
If there is no change the first 1st result of google search you can use this code to get the first result in the search.
<?php
$search_list = 'search_list.txt'; // file containing search terms
$result_list = 'results.txt'; // file containing search terms
$order_language = "en"
$searching_list = file($search_list);
foreach ($search_list as $key => $searching_word) {
$link = 'https://www.google.com.tr/search?hl='.$order_language.'&q='.$searching_word.'&btnI=1';
echo $link;
file_put_contents($result_list,$link[0]);
}
?>
I searched for 'asd' =) again ...
The result
https://www.google.com.tr/search?hl=en&q=asd&btnI=1
When i copied and paste to chrome , this link redirect to me to 1st result of 'asd searching'.
http://www.asd-europe.org/
If i can help you , i'll feel happy .
Have a good day.

Rate limit. Twitter API

I'm working on a small and simple code which basically does some tweets filtering. The problem is that I'm hitting the request limit of Twitter API and I would like to know if there is a workaround or if what I want to do just cannot be done.
First, I type a twitter username to retrieve the ID's of people this user follows.
$user_id = $_GET["username"];
$url_post = "http://api.twitter.com/1/friends/ids.json?cursor=-1&screen_name=" . urlencode($user_id);
$following = file_get_contents($url_post, true);
$json = json_decode($following);
$ids = $json->ids;
Twitter API responds with a list of ID's.
Here comes the problem. The next step is to make a request to find out username, profile picture and description for each one of those ID's.
$following = array();
foreach ($ids as $value)
{
$build_url = 'http://api.twitter.com/1/users/lookup.json?user_id=' . $value . '';
$following[] = $build_url;
}
foreach ($following as $url)
{
$data_names = file_get_contents($url, true); //getting the file content
$json_names = json_decode($data_names);
foreach ($json_names as $tweet) {
$name = $tweet->name;
$description = $tweet->description;
echo '<p>';
echo $name . '<br>';
echo $description;
echo '</p>';
}
}
If the user follows 50 people it works. But if he follows, let's say, 600 hundred, that would be 600 hundred request (for username, description and profile pic) to Twitter API which exceeds the limit.
Is there any way to workaround this o it just cannot be done?
Thank you!
You can and should request users/lookup API endPoint with 100 userIds at a time, instead of doing one request per twitter ID. cf. https://dev.twitter.com/docs/api/1.1/get/users/lookup
You have to replace your forEach loop (foreach ($following as $url)) by a recursive function.
At the end of the function, check the number of hits remaining before calling it again (cf. this link to see how to know the time remining until you get rate limited).
If there is no hit left, sleep 15 minutes before calling the function again, otherwise do the call again.
There is plenty of information on how to do this, use Google and search existing stackOverflow questions.

PHP - Codeigniter twitter loops

new here!!
Im trying to get tweets to display on a site(framework is codeigniter). I am using the twitter api (for example: https://api.twitter.com/1/statuses/user_timeline/ddarrko.xml) to get the tweets and subsequently insert them into a database which I will then get to display on the site. The actual code runs fine the issue is it only ever processes one tweet. my code is -
//get twitter address
$this->load->model('admin_model');
$getadd = $this->admin_model->get_settings("twitter_address");
$twitter_user = $getadd->item_value;
//define twitter xml file
$xmlpath = "https://api.twitter.com/1/statuses/user_timeline/".$twitter_user.".xml";
$xml = simplexml_load_file($xmlpath);
foreach ($xml->status as $tweet);
{
echo "<pre>";print_r($xml);echo "</pre>";
$this->data->username=$twitter_user;
$this->data->twitter_status=$tweet->text;
$this->data->pub_date=$tweet->created_at;
//load model and insert tweets;
$this->load->model('tweet_model');
$this->tweet_model->insert_tweets($this->data);
}
as you can see I am defining to run each status in xml file. The echo pre line is me testing because when printing $tweet only one tweet is coming up however even if i loop through just $xml still only one tweet is processed despite there being loads in the file.
any help/advice would be greatly appreciated!
Below code will give you all the available tweets from XML. you can manipulate below logic as per your need.
$xmlpath = "https://api.twitter.com/1/statuses/user_timeline/".$twitter_user.".xml";
$xml = simplexml_load_file($xmlpath);
$count_tweet = sizeof($xml);
for($i=0 ; $i < $count_tweet ; $i++)
{
echo"<br>Tweet: ". $xml->status[$i]->text;
echo"<br>date:".$xml->status[$i]->created_at."<br>";
}
Thanks.

Categories