PHP Get URL Contents And Search For String [duplicate] - php

This question already has answers here:
Get content from a url using php
(3 answers)
Closed 7 years ago.
In php I need to get the contents of a url (source) search for a string "maybe baby love you" and if it does not contain this then do x.

Just read the contents of the page as you would read a file. PHP does the connection stuff for you. Then just look for the string via regex or simple string comparison.
$url = 'http://my.url.com/';
$data = file_get_contents( $url );
if ( strpos( 'maybe baby love you', $data ) === false )
{
// do something
}

//The Answer No 3 Is good But a small Mistake in the function strpos() I have correction the code bellow.
$url = 'http://my.url.com/';
$data = file_get_contents( $url );
if ( strpos($data,'maybe baby love you' ) === false )
{
// do something
}

Assuming fopen URL Wrappers are on ...
$string = file_get_contents('http://example.com/file.html');
if(strpos ('maybe baby love you', $string) === false){
//do X
}

If fopen URL wrappers are not enabled, you may be able to use the curl module (see http://www.php.net/curl )
Curl also gives you the ability to deal with authenticated pages, redirects, etc.

Related

Scrape link in web page with specific class php [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
as the title suggests, I would like to retrieve links that have a specific class.
I have the code to connect to the pages and with the preg_match function I would like to take only the url that is in href = "url".
the structure of the link I would like to take and the one found in href = "", this link is in a table and can also have other attributes but not id, only the view class.
<a title="viwe" class="view" href="link">blablabla</a>
while I wrote this code
$curl = curl_init('http://prove/prove/pag/test.php');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$page = curl_exec($curl);
if(curl_errno($curl)) // check for execution errors
{
echo 'Scraper error: ' . curl_error($curl);
exit;
}
curl_close($curl);
$regex = '/<a.*?>(.*?)<\/a>/';
if ( preg_match($regex, $page, $list) )
echo $list[0];
else
print "Not found";
Well, the method you are trying to implement is not recommended, yet if you have to, this expression might be closer to what you have in mind, that I'm guessing:
<a\s.*?\sclass="\s*view\s*"[^>]*>.*?<\/a>
Demo

How to create bitly shortened url from user's inputed text?

Begginer here, people. Could anybody suggest any kind of solution? I've an user inputed text.
First of all I check if the text has any urls:
$post = preg_replace('/https?:\/\/[\w\-\.!~?&+\*\'"(),\/]+/','<a class="post_link"
href="$0">$0</a>',$post);
And after that I need to retrieve that url and put as a variable($url) to this function:
$short=make_bitly_url('$url','o_6sgltp5sq4as','R_f5212f1asdads1cee780eed00d2f1bd2fd794f','xml');
And finally, echo both url and user's text. Thanks in advance for ideas and critiques.
I've tried something like that:
$post = preg_replace('/https?:\/\/[\w\-\.!~?&+\*\'"(),\/]+/e',$url,$post){
$shorten = make_bitly_url($url,'o_6sgltpmm5sq4','R_f5212f11cee780ekked00d2f1bd2fd794f','json');
return '<a class="post_link" href="$shorten">$shorten</a>';
};
But even for me it looks some kind of nonsense.
Bitly does have an API available for use. You should check out API Documentation
Here's how to use the bit.ly API from PHP:
/* make a URL small */
function make_bitly_url($url,$login,$appkey,$format = 'xml',$version = '2.0.1')
{
//create the URL
$bitly = 'http://api.bit.ly/shorten?version='.$version.'&longUrl='.urlencode($url).'&login='.$login.'&apiKey='.$appkey.'&format='.$format;
//get the url
//could also use cURL here
$response = file_get_contents($bitly);
//parse depending on desired format
if(strtolower($format) == 'json')
{
$json = #json_decode($response,true);
return $json['results'][$url]['shortUrl'];
}
else //xml
{
$xml = simplexml_load_string($response);
return 'http://bit.ly/'.$xml->results->nodeKeyVal->hash;
}
}
/* usage */
$short = make_bitly_url('http://davidwalsh.name','davidwalshblog','R_96acc320c5c423e4f5192e006ff24980','json');
echo 'The short URL is: '.$short;
// returns: http://bit.ly/11Owun
Source: David Walsh article
HOWEVER, if you wanted to create your own URL shortening system (similar to bit.ly -- and surprisingly easy to do), here is an 8-part tutorial from PHPacademy on how to do that:
Difficulty level: beginner / intermediate
Each video is approx ten minutes.
Part 1
Part 2
Part 3
Part 4
Part 5
Part 6
Part 7
Part 8

preg_match_all matches unexpectedly [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
I've just started PHP and I want to scrape a little page which I can't, I tried doing 'PREG_MATCH_ALL' but it just doesn't get the result I want.. Basically I want to scrape the youtube video links from here only: https://gdata.youtube.com/feeds/api/standardfeeds/most_shared - Scrape all of them and then use them later.
I tried using the following code which failed;
<?php
$data = file_get_contents('https://gdata.youtube.com/feeds/api/standardfeeds/most_shared');
preg_match_all("/src='(.+?)'>/", $data, $links);
$link_out = $links[0][0];
echo $link_out;
?>
I'm new to PHP, so little help please.
Thanks
As the feed is XML, you can use PHP's SimpleXMLElement to obtain the data.
<?php
$xml = new SimpleXMLElement(
'https://gdata.youtube.com/feeds/api/standardfeeds/most_shared',
null,
true
);
foreach($xml->entry as $entry) {
echo $entry->content['src'], PHP_EOL;
}
/*
https://www.youtube.com/v/IjWc43FCYlg?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/Xw1C5T-fH2Y?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/Kq0_dGKx4Os?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/gbcBYs0ljI0?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/78juOpTM3tE?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/OOiZ-5DqwYI?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/zjz614QVyfQ?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/h15m87WsCHQ?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/SXKOTdyOUBg?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/BRAM8MpqIeA?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/5yB3n9fu-rM?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/NAOo9SnzRH8?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/0KtILkzC-1g?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/kWSIFh8ICaA?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/Mi6AhogZCeg?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/kWuIGAZ1x2I?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/lKY5fmDGVLs?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/C94PaCtqOk4?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/V-fL8zopddI?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/UWlzMIl7E48?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/mcw6j-QWGMo?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/-RSDaRttpzk?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/8_RDx4skTp4?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/7YDWdv9kR0M?version=3&f=standard&app=youtube_gdata
https://www.youtube.com/v/m96tYpEk1Ao?version=3&f=standard&app=youtube_gdata
*/
Anthony.
Try with this pregmatch:
preg_match_all("/src='([^']+)'/si", $data, $links);
and show results:
echo "<pre>";
print_r($links);
<?php
$data = file_get_contents('https://gdata.youtube.com/feeds/api/standardfeeds/most_shared');
preg_match_all("/src='(.+?)'\/>/", $data, $links);
print_r($links[1]);
You forgot to match the closing / of the anchor tags.

HEAD request in PHP to validate (network not syntax) URLs?

Here is the php.net listing for http_head.
The function prototype is
string http_head ( string $url [, array $options [, array &$info ]] )
A list of $options is here.
I want to use this to validate that a set of URLs is valid.
[url1, url2, url3]
Are there any options that should be set? Is any of the $info relevant to should I just make sure that false is not returned instead of a string?
You probably don't need to set any of the $options, unless you're behind a proxy server or need to do something unusual. You probably should look through them just in case.
It's not likely you need to mess with $info unless you're debugging, since this gives you more complete visibility into the request and response.
Sample code:
foreach ($urls as $url) {
$response = http_head($url);
if ($response !== false) {
# FIXME do something cool
} else {
# FIXME hey that url is broken!
}
}

How to write a PHP script to find the number of indexed pages in Google?

I need to find the number of indexed pages in google for a specific domain name, how do we do that through a PHP script?
So,
foreach ($allresponseresults as $responseresult)
{
$result[] = array(
'url' => $responseresult['url'],
'title' => $responseresult['title'],
'abstract' => $responseresult['content'],
);
}
what do i add for the estimated number of results and how do i do that?
i know it is (estimatedResultCount) but how do i add that? and i call the title for example this way: $result['title'] so how to get the number and how to print the number?
Thank you :)
I think it would be nicer to Google to use their RESTful Search API. See this URL for an example call:
http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=site:stackoverflow.com&filter=0
(You're interested in the estimatedResultCount value)
In PHP you can use file_get_contents to get the data and json_decode to parse it.
You can find documentation here:
http://code.google.com/apis/ajaxsearch/documentation/#fonje
Example
Warning: The following code does not have any kind of error checking on the response!
function getGoogleCount($domain) {
$content = file_get_contents('http://ajax.googleapis.com/ajax/services/' .
'search/web?v=1.0&filter=0&q=site:' . urlencode($domain));
$data = json_decode($content);
return intval($data->responseData->cursor->estimatedResultCount);
}
echo getGoogleCount('stackoverflow.com');
You'd load http://www.google.com/search?q=domaingoeshere.com with cURL and then parse the file looking for the results <p id="resultStats" bit.
You'd have the resulting html stored in a variable $html and then say something like
$arr = explode('<p id="resultStats"'>, $html);
$bottom = $arr[1];
$middle = explode('</p>', $bottom);
Please note that this is untested and a very rough example. You'd be better off parsing the html with a dedicated parser or matching the line with regular expressions.
google ajax api estimatedResultCount values doesn't give the right value.
And trying to parse html result is not a good way because google blocks after several search.
Count the number of results for site:yourdomainhere.com - stackoverflow.com has about 830k
// This will give you the count what you see on search result on web page,
//this code will give you the HTML content from file_get_contents
header('Content-Type: text/plain');
$url = "https://www.google.com/search?q=your url";
$html = file_get_contents($url);
if (FALSE === $html) {
throw new Exception(sprintf('Failed to open HTTP URL "%s".', $url));
}
$arr = explode('<div class="sd" id="resultStats">', $html);
$bottom = $arr[1];
$middle = explode('</div>', $bottom);
echo $middle[0];
Output:
About 8,130 results
//vKj
Case 2: you can also use google api, but its count is different:
https://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=ursitename&callback=processResults
https://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=site:google.com
cursor":{"resultCount":"111,000,000","
"estimatedResultCount":"111000000",

Categories