I'm trying to extract the first occurence of a link that starts like this
https://encrypted-tbn3.gstatic.com/images?...
from the source code of a page. The link starts and ends with a ". Here is what I've got so far:
$search_query = $array[0]['Name'];
$search_query = urlencode($search_query);
$context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla compatible')));
$response = file_get_contents( "https://www.google.com/search?q=$search_query&tbm=isch", false, $context);
$html = str_get_html($response);
$url = explode('"',strstr($html, 'https://encrypted-tbn3.gstatic.com/images?'[0]))
However the output of $url is not the link I try to extract, but something very different. I have added an image.
Could anyone explain the output to me and how I would get the desired link? Thanks
It seems that you're using PHP Simple HTML DOM Parser.
I normally use DOMDocument, which is part of php build-in classes.
Here's a working example of what you need:
$search_query = $array[0]['Name'];
$search_query = urlencode($search_query);
$context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla compatible')));
$response = file_get_contents( "https://www.google.com/search?q=$search_query&tbm=isch", false, $context);
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($response);
foreach ($dom->getElementsByTagName('img') as $item) {
$img_src = $item->getAttribute('src');
if (strpos($img_src, 'https://encrypted') !== false) {
print $img_src."\n";
}
}
Output:
https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcSumjp6e37O_86nc36mlktuWpbFuCI4nkkkocoBCYW3qCOicqdu_KEK-MY
https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcR_ttK8NlBgui_JndBj349UxZx0kHn0Z-Essswci-_5UQCmUOruY1PNl3M
https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcSydaTpSDw2mvU2JRBGEYUOstTUl4R1VhRevv1Sdinf0fxRvU26l3pTuqo
...
$url_beginning = 'https://encrypted-tbn3.gstatic.com/images?';
if(preg_match('/\"(https\:\/\/encrypted\-tbn3\.gstatic\.com\/images\?.+?)\"/ui',$html, $matches))
$url = $matches[1];
else
$url = '';
try to use preg_replace, it is more suitable for parsing
And in this eample a assumed that url in your HTML should be quoted.
UPD
a little bit tuned version to be usable for any url-beginning:
$url_beginning = 'https://encrypted-tbn3.gstatic.com/images?';
$url_beginning = preg_replace('/([^а-яА-Яa-zA-Z0-9_#%\s])/ui', '\\\\$1', $url_beginning);
if(preg_match('/\"('.$url_beginning.'.+?)\"/ui',$html, $matches))
$url = $matches[1];
else
$url = '';
Related
I'm trying to get this messages (METAR) from a page and show everything just in other php file without the styles and extra info.
At the moment I'm using this code:
<?php
$options = array('http' => array(
'method' => 'GET',
));
$config= stream_context_create($options);
$config_final=file_get_contents('http://www.smn.gov.ar/mensajes/index.php?observacion=metar&operacion=consultar&87582=on&87641=on&87750=on&87765=on&87222=on&87761=on&87860=on&87395=on&87344=on&87166=on&87904=on&87571=on&87347=on&87803=on&87576=on&87162=on&87532=on&87497=on&87097=on&87046=on&87548=on&87217=on&87506=on&87692=on&87418=on&87574=on&87715=on&87374=on&87289=on&87852=on&87178=on&87896=on&87823=on&87270=on&87155=on&87453=on&87925=on&87934=on&87480=on&87047=on&87553=on&87311=on&87909=on&87436=on&87509=on&87912=on&87623=on&87444=on&87129=on&87371=on&87645=on&87022=on&87127=on&87828=on&87121=on&87938=on&87791=on&87448=on',false, $config);
preg_match_all("|<td width=\"100%\">METAR (.*)</td>|sU", $config_final, $tiempo);
echo $tiempo[1][0];
?>
</div>
Using that code I can get only the first METAR, Waht I need is to see all of them in different lines, like showing different results.
Any ideas?
Thanks in advance
I suggest you utilize PHP's built in HTML Parsers for this, in particular the DOMDocument, and use DOMXpath to search for those needle.
Example:
$url = 'http://www.smn.gov.ar/mensajes/index.php?observacion=metar&operacion=consultar&87582=on&87641=on&87750=on&87765=on&87222=on&87761=on&87860=on&87395=on&87344=on&87166=on&87904=on&87571=on&87347=on&87803=on&87576=on&87162=on&87532=on&87497=on&87097=on&87046=on&87548=on&87217=on&87506=on&87692=on&87418=on&87574=on&87715=on&87374=on&87289=on&87852=on&87178=on&87896=on&87823=on&87270=on&87155=on&87453=on&87925=on&87934=on&87480=on&87047=on&87553=on&87311=on&87909=on&87436=on&87509=on&87912=on&87623=on&87444=on&87129=on&87371=on&87645=on&87022=on&87127=on&87828=on&87121=on&87938=on&87791=on&87448=on';
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile($url);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
// search for td's with contains METAR
$metars = $xpath->query('//td[contains(text(), "METAR")]');
if($metars->length <= 0) {
echo 'no metars found';
exit;
}
$data = array();
foreach($metars as $metar) {
$data[] = $metar->nodeValue;
}
echo '<pre>';
print_r($data);
Im looking to only display certain things from pages like this: http://sc2ranks.com/api/psearch/am/MxGPezz/1t/division/Felanis%20Sierra?appKey=sentinelgaming.net . So far I am able to display something but its not even the correct number, using the php below. Can someone show me how I would display the "achivement-points" of this player from this XML web page?
$url = 'http://sc2ranks.com/api/psearch/am/MxGPezz/1t/division/Felanis%20Sierra?appKey=sentinelgaming.net';
$xml = file_get_contents($url);
echo $xml->achievement-points;
Thanks
The content-type of this file varies depending on the Accept header or the format query parameter. It seems you can retrieve at least XML or JSON.
The default you get from file_get_contents() will be JSON because it does not include an Accept request header, but the default from a browser will be XML because browsers usually include an XML mime type in their Accept request header.
To get JSON:
$url = 'http://sc2ranks.com/api/psearch/am/MxGPezz/1t/division/Felanis%20Sierra?appKey=sentinelgaming.net';
// &format=json is not strictly necessary,
// but it will give you fewer surprises
$json = file_get_contents($url.'&format=json');
$records = json_decode($json);
echo $records[0]->achievement_points, "\n";
To get XML:
$sxe = simplexml_load_file($url.'&format=xml');
echo (string) $sxe->record->{'achievement-points'}, "\n";
To use the $sxe object see this SimpleXML cheat sheet.
Instead of using the format param you could set the Accept header. You can also add some abstraction to getting a url so that you can retrieve the content type and encoding as well. See example below.
function get_url($url, $context=null) {
$response = file_get_contents($url, false, $context);
$ctypeheaders = preg_grep('/^Content-Type:\s/i', $http_response_header);
$ctype = NULL;
if ($ctypeheaders) {
$ctype = end($ctypeheaders);
$ctype = end(explode(':', $ctype, 2));
$ctype = explode(';', $ctype, 2);
$charset = isset($ctype[1]) ? $ctype[1] : '';
if ($charset && preg_match('/charset\s*=\s*([^\s]+)/i', $charset, $matches)) {
$charset = $matches[1];
}
$ctype[1] = $charset;
$ctype = array_map('trim', $ctype);
}
return array($response, $ctype);
}
You can then use get_url() like so:
// With no accept header, just see what we get:
list($content, $contenttype) = get_url($url);
list($type, $encoding) = $contenttype;
// $type will be 'application/xml' or 'application/json'
// $encoding is very handy to know too
// Or we can specify an accept header:
$opt_accept_xml = stream_context_create(array(
'http' => array(
'header' => "Accept: application/xml\r\n"
)
));
list($content, $contenttype) = get_url($url, $opt_accept_xml);
Maybe:
echo $xml->record[0]->achievement-points;
I have a small problem with my tweets script. But for some reason, I don't know what the error could be. This is the error it gives me:
$url = "http://www.twitter.com/statuses/user_timeline/{$username}.xml?count={$number}";
$tweets = file_get_contents($url);
$feed = new SimpleXMLElement($tweets);
function time_stamp($date){
if (empty($date)){
return "No date provided";
}
and on the index.php page, it'll show this code:
<?php
$username = "user";//your twitter username
$number = 3;//number of tweets
include ("{$dir}/php/tweets.php");
?>
Do you guys know what it is that I'm doing wrong?
You don't need file_get_contents()
Try:
$url = "http://www.twitter.com/statuses/user_timeline/{$username}.xml?count={$number}";
$feed = simplexml_load_file($url);
Also, twitter made some changes not too long ago so your URL needs to look like this:
$url = "http://api.twitter.com/1/statuses/user_timeline/{$username}.xml?count={$number}";
Check this discussion.
You can use JSON easy and faster than XML
And to get content you can use
Curl => Faster
or
File_get_contents
Url
https://api.twitter.com/1/statuses/user_timeline.json?include_entities=true&include_rts=true&screen_name={screenname}&count={count}
like this
<?php
$url = 'https://api.twitter.com/1/statuses/user_timeline.json?include_entities=true&include_rts=true&screen_name=abdullaheid&count=3'
$x = file_get_contents( $url ) ; // Using file get contents
$object = json_decode( $x ) ;
$array = (array) $object ;
print_r( $array ) ;
?>
I am using a bit.ly shortener for my custom domain. It outputs http://shrt.dmn/abc123; however, I'd like it to just output shrt.dmn/abc123.
Here is my code.
//automatically create bit.ly url for wordpress widgets
function bitly()
{
//login information
$url = get_permalink(); //for wordpress permalink
$login = 'UserName'; //your bit.ly login
$apikey = 'API_KEY'; //add your bit.ly APIkey
$format = 'json'; //choose between json or xml
$version = '2.0.1';
//generate the URL
$bitly = 'http://api.bit.ly/shorten?version='.$version.'&longUrl='.urlencode($url).'&login='.$login.'&apiKey='.$apikey.'&format='.$format;
//fetch url
$response = file_get_contents($bitly);
//for json formating
if(strtolower($format) == 'json')
{
$json = #json_decode($response,true);
echo $json['results'][$url]['shortUrl'];
}
else //for xml formatting
{
$xml = simplexml_load_string($response);
echo 'http://bit.ly/'.$xml->results->nodeKeyVal->hash;
}
}
As long as it is supposed to be url and if there is http:// - then this solution is the simplest possible:
$url = str_replace('http://', '', $url);
Change your following line:
echo $json['results'][$url]['shortUrl'];
for this one:
echo substr( $json['results'][$url]['shortUrl'], 7);
You want to do a preg_replace.
$variable = preg_replace( '/http:\/\//', '', $variable ); (this is untested, so you might also need to escape the : character ).
you can also achieve the same effect with $variable = str_replace('http://', '', $variable )
I am trying to read a weather feed from Yahoo to my site.
Using the code below I was able to print the xml.
What I really want to achieve now is to put the temperature and image in two different variables
$zipCode = "44418";
$url = "http://weather.yahooapis.com/forecastrss";
$zip = "?w=$zipCode";
$fullUrl = $url . $zip.'&u=c';
$curlObject = curl_init();
curl_setopt($curlObject,CURLOPT_URL,$fullUrl);
curl_setopt($curlObject,CURLOPT_HEADER,false);
curl_setopt($curlObject,CURLOPT_RETURNTRANSFER,true);
$returnYahooWeather = curl_exec($curlObject);
curl_close($curlObject);
print "yahooWeather". $returnYahooWeather;
//$temperature
//$image
You should go ahead and use simplexml or DOM to parse the XML and then you can iterate over the results. With SimpleXML this looks like this:
$zipCode = "44418";
$url = "http://weather.yahooapis.com/forecastrss";
$zip = "?w=$zipCode";
$fullUrl = $url . $zip.'&u=c';
$curlObject = curl_init();
curl_setopt($curlObject,CURLOPT_URL,$fullUrl);
curl_setopt($curlObject,CURLOPT_HEADER,false);
curl_setopt($curlObject,CURLOPT_RETURNTRANSFER,true);
$returnYahooWeather = curl_exec($curlObject);
curl_close($curlObject);
//print "here". $returnYahooWeather;
$xmlobj=simplexml_load_string($returnYahooWeather);
$res = $xmlobj->xpath("//yweather:condition");
$tmp = false;
while(list( , $node) = each($res)) {
$tmp = $node;
}
$attribs = $tmp->attributes();
print "Temperature [".$attribs['temp']."]";
I find it easiest to SimpleXML with PHP.
$xml = simplexml_load_string($returnYahooWeather);
echo $xml->Path->To->Temperature;
It's easy enough, and you can use XPath with SimpleXML :). There are other ways of parsing XML too, as previously mentioned DOMDocument is one of them.