Find email address from website url [closed] - php

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to find website email address(like name#example.com) from website url link. so, is it possible to find email address from website url ? if yes then, please share how to implement. Language is not necessary.
as per my view,
if we read content from website url using CURL, and
find email address from them using regular expression.
is it possible ?
find bellow code for read page content from website url using CURL:
<?php
$url = 'yoururl';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
$data = curl_exec($curl);
curl_close($curl);
after then find email address using regex expression from $data string.
Is it possible ?

Technically you could get an email address from a domain by querying the public WHOIS information (which could be done by querying an API) but the email address publicised are rarely a companies true email address but rather reporting mailboxes for spam or technical requests.
http://network-tools.com/default.asp?prog=network&host=www.google.com
Some example code of how it could be done returning JSON output:
<?php
function getIP() {
if (!empty($_SERVER['HTTP_CLIENT_IP'])) {
$ip = $_SERVER['HTTP_CLIENT_IP'];
} elseif (!empty($_SERVER['HTTP_X_FORWARDED_FOR'])) {
$ip = $_SERVER['HTTP_X_FORWARDED_FOR'];
} else {
$ip = $_SERVER['REMOTE_ADDR'];
}
return chkIP($ip);
}
function chkIP($ip) {
$dirtydomain = gethostbyaddr($ip);
preg_match("/((\w*)\.+(\w{2})\.+(\w{2})$)|((\w*)\.+(\w{3})$)/", $dirtydomain, $output_array);
$cmd = 'whois ' . $output_array[0];
$data = shell_exec($cmd);
return getEmail($data,$output_array[0]);
}
function getEmail($data,$domain) {
$array = preg_split('/( )|(\n)/',$data); //DATA from WHOIS
foreach ($array as $value) {
if (strpos($value, '#') == TRUE) {
$emailArray[] = $value;
}
}
return outputArray($emailArray,$domain);
}
function outputArray($emailArray, $domain) {
if (count($emailArray) < 1) {
return json_encode("No Email Address Found for " . $domain);
} else {
return json_encode($emailArray);
}
}
getIP(); //Will Return JSON Output
?>

An easy regexp on the top of my head.
preg_match_all("/([a-z0-9\.]{1,50}#[a-z0-9]{1,50}\.[a-z]{1,5})/ims",$data,$matches)

Related

Validation of the youtube address

I have to do a plugin that allows you to insert videos from youtube in website. For this purpose I have encountered a problem, I want to validate the correctness of the url address from youtube. I want to check the correctness of the address, under the account:
- check if the id of the movie is included in the address
- Check if the address contains (youtube.com or youtu.be)
My code only checks if the url contains (youtu.be or youtube.com). I do not know how to check if the address has a movie id of 11 characters long. Do you have any idea?
<?php
$url = 'https://www.youtube.com/watch?v=knfrxj0T5NY';
if (strpos($url, 'youtube.com') || strpos($url, 'youtu.be')){
echo 'ok';
}else{
echo 'no';
}
?>
Method using cURL:
function isValidYoutubeURL($url) {
// Let's check the host first
$host = parse_url($url, PHP_URL_HOST);
if (!in_array($host, array('youtube.com', 'www.youtube.com'))) {
return false;
}
$ch = curl_init('www.youtube.com/oembed?url='.urlencode($url).'&format=json');
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_exec($ch);
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
return ($status !== 404);
}

Google Sitemap Ping Success [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 11 months ago.
Improve this question
I have a php script that creates an xml sitemap. At the end, I use
shell_exec('ping -c1 www.google.com/webmasters/tools/ping?sitemap=sitemapurl');
to submit the updated sitemap to Google Webmaster tools.
Having read the Google documentation, I'm unsure whether I need to do this each time or not. Entering the link in the code manually, results in a success page from google, but using the ping command I receive no confirmation. I would also like to know if there is any way of checking if the command has actually worked.
Here is a script to automatically submit your site map to google, bing/msn and ask:
/*
* Sitemap Submitter
* Use this script to submit your site maps automatically to Google, Bing.MSN and Ask
* Trigger this script on a schedule of your choosing or after your site map gets updated.
*/
//Set this to be your site map URL
$sitemapUrl = "http://www.example.com/sitemap.xml";
// cUrl handler to ping the Sitemap submission URLs for Search Engines…
function myCurl($url){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
return $httpCode;
}
//Google
$url = "http://www.google.com/webmasters/sitemaps/ping?sitemap=".$sitemapUrl;
$returnCode = myCurl($url);
echo "<p>Google Sitemaps has been pinged (return code: $returnCode).</p>";
//Bing / MSN
$url = " https://www.bing.com/webmaster/ping.aspx?siteMap=".$sitemapUrl;
$returnCode = myCurl($url);
echo "<p>Bing / MSN Sitemaps has been pinged (return code: $returnCode).</p>";
//ASK
$url = "http://submissions.ask.com/ping?sitemap=".$sitemapUrl;
$returnCode = myCurl($url);
echo "<p>ASK.com Sitemaps has been pinged (return code: $returnCode).</p>";
you can also send yourself an email if the submission fails:
function return_code_check($pingedURL, $returnedCode) {
$to = "webmaster#yoursite.com";
$subject = "Sitemap ping fail: ".$pingedURL;
$message = "Error code ".$returnedCode.". Go check it out!";
$headers = "From: hello#yoursite.com";
if($returnedCode != "200") {
mail($to, $subject, $message, $headers);
}
}
Hope that helps
Since commands like shell_exec(), exec(), passthru() etc. are blocked by many hosters, you should use curl and check for a response code of 200.
You could also use fsockopen if curl is not available. I'm going to check for the code snippet and update the answer when I found it.
UPDATE:
Found it. I knew I used it somewhere. The funny coincedence: It was in my Sitemap class xD
You can find it here on github: https://github.com/func0der/Sitemap. It is in the Sitemap\SitemapOrg class.
There is a also an example for the curl call implemented.
Either way, here is the code for stand alone implementation.
/**
* Call url with fsockopen and return the response status.
*
* #param string $url
* The url to call.
*
* #return mixed(boolean|int)
* The http status code of the response. FALSE if something went wrong.
*/
function _callWithFSockOpen($url) {
$result = FALSE;
// Parse url.
$url = parse_url($url);
// Append query to path.
$url['path'] .= '?'.$url['query'];
// Setup fsockopen.
$port = 80;
$timeout = 10;
$fso = fsockopen($url['host'], $port, $errno, $errstr, $timeout);
// Proceed if connection was successfully opened.
if ($fso) {
// Create headers.
$headers = 'GET ' . $url['path'] . 'HTTP/1.0' . "\r\n";
$headers .= 'Host: ' . $url['host'] . "\r\n";
$headers .= 'Connection: closed' . "\r\n";
$headers .= "\r\n";
// Write headers to socket.
fwrite($fso, $headers);
// Set timeout for stream read/write.
stream_set_timeout($fso, $timeout);
// Use a loop in case something unexpected happens.
// I do not know what, but that why it is unexpected.
while (!feof($fso)){
// 128 bytes is getting the header with the http response code in it.
$buffer = fread($fso, 128);
// Filter only the http status line (first line) and break loop on success.
if(!empty($buffer) && ($buffer = substr($buffer, 0, strpos($buffer, "\r\n")))){
break;
}
}
// Match status.
preg_match('/^HTTP.+\s(\d{3})/', $buffer, $match);
// Extract status.
list(, $status) = $match;
$result = $status;
}
else {
// #XXX: Throw exception here??
}
return (int) $result;
}
If you guys find any harm or improvement in this code, do not hesitate to open up a ticket/pull request on GitHub, please. ;)
Simplest solution: file_get_contents("https://www.google.com/webmasters/tools/ping?sitemap={$sitemap}");
That will work on every major hosting provider. If you want optional error reporting, here's a start:
$data = file_get_contents("https://www.google.com/webmasters/tools/ping?sitemap={$sitemap}");
$status = ( strpos($data,"Sitemap Notification Received") !== false ) ? "OK" : "ERROR";
echo "Submitting Google Sitemap: {$status}\n";
As for how often you should do it, as long as your site can handle the extra traffic from Google's bots without slowing down, you should do this every time a change has been made.

Website Username Check [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I want to create a function that takes a URL parameter and a username parameter that goes to that url, enters the username and if the page says that the username doesn't exist(error) it will return false, otherwise it will return true. So I tried and I did something a little like this, but it didnt work.
function ($url, $username) {
$main = file_get_contents($url.$username);
if (#$main) { return false; }
else { return true; }
}
So if you have any ideas on how to make this idea actually work please help me
$site = 'https://twitter.com/';
$username = 'SteveMartinToGo';
$url = $site.$username;
function urlExists($url=NULL) {
if($url == NULL) return false;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode >= 200 && $httpcode < 400){
return true;
} else {
return false;
}
}
if(urlExists($url)) {
echo 'url exists';
} else {
echo 'url does not exist';
}
The problem with this is that some sites (like Facebook) will return a 200 instead of a 404, so the URL will show as existing even though it's not. Also, I got this function from someplace else (can't remember where though) so I don't want to take credit for that code. hope it helps...
Edit: updated because of fred-ii's eagle eye and suggestions. :)
It is important to have the actual structure of the url. You can make a print and view the source code (HTML) if there is no special characters in the URL can be masked.

PHP - Parse_url only get pages

I'm working on a little webcrawler as a side project at the moment and basically having it collect all hrefs on a page and then subsequently parsing those, my problem is.
How can I only get the actual page results? at the moment i'm using the following
foreach($page->getElementsByTagName('a') as $link)
{
$compare_url = parse_url($link->getAttribute('href'));
if (#$compare_url['host'] == "")
{
$links[] = 'http://'.#$base_url['host'].'/'.$link->getAttribute('href');
}
elseif ( #$base_url['host'] == #$compare_url['host'] )
{
$links[] = $link->getAttribute('href');
}
}
As you can see this will bring in jpegs, exe files etc. I only need to pickup the web pages like .php, .html, .asp etc.
I'm not sure if there is some function able to work this one out or if it will need to be regex from some sort of master list?
Thanks
Since the URL string alone doesn't connected with the resource behind it in any way you will have to go out and ask the webserver about them. For this there's a HTTP method called HEAD so you won't have to download everything.
You can implement this with curl in php like this:
function is_html($url) {
function curl_head($url) {
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_MAXREDIRS, 5);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HTTP_VERSION , CURL_HTTP_VERSION_1_1);
$content = curl_exec($curl);
curl_close($curl);
// redirected heads just pile up one after another
$parts = explode("\r\n\r\n", trim($content));
// return only the last one
return end($parts);
}
$header = curl_head('http://github.com');
// look for the content-type part of the header response
return preg_match('/content-type\s*:\s*text\/html/i', $header);
}
var_dump(is_html('http://github.com'));
This version is only accepts text/html responses and doesn't check if the response is 404 or other error (however follows redirects up to 5 jumps). You can tweak the regexp or add some error handling in either from the curl response, or by matching against the header string's first line.
Note: Webservers will run scripts behind these URLs to give you responses. Be careful not overload hosts with probing, or grabbing "delete" or "unsubscribe" type links.
To check if a page is valid (html,php... extension use this function:
function check($url){
$extensions=array("php","html"); //Add extensions here
foreach($extensions as $ext){
if(substr($url,-(strlen($ext)+1))==".".$ext){
return 1;
}
}
return 0;
}
foreach($page->getElementsByTagName('a') as $link) {
$compare_url = parse_url($link->getAttribute('href'));
if (#$compare_url['host'] == "") { if(check($link->getAttribute('href'))){ $links[] = 'http://'.#$base_url['host'].'/'.$link->getAttribute('href');} }
elseif ( #$base_url['host'] == #$compare_url['host'] ) {
if(check($link->getAttribute('href'))){ $links[] = $link->getAttribute('href'); }
}
Consider using preg_match to check the type of the link (application , picture , html file) and considering the results decide what to do.
Another option (and simple) is to use explode and find the last string of the url which comes after a . (the extension)
For instance:
//If the URL will has any one of the following extensions , ignore them.
$forbid_ext = array('jpg','gif','exe');
foreach($page->getElementsByTagName('a') as $link) {
$compare_url = parse_url($link->getAttribute('href'));
if (#$compare_url['host'] == "")
{
if(check_link_type($link->getAttribute('href')))
$links[] = 'http://'.#$base_url['host'].'/'.$link->getAttribute('href');
}
elseif ( #$base_url['host'] == #$compare_url['host'] )
{
if(check_link_type($link->getAttribute('href')))
$links[] = $link->getAttribute('href');
}
}
function check_link_type($url)
{
global $forbid_ext;
$ext = end(explode("." , $url));
if(in_array($ext , $forbid_ext))
return false;
return true;
}
UPDATE (instead of checking 'forbidden' extensions , let's look for good ones)
$good_ext = array('html','php','asp');
function check_link_type($url)
{
global $good_ext;
$ext = end(explode("." , $url));
if($ext == "" || !in_array($ext , $good_ext))
return true;
return false;
}

how to find the total no.of inbound and outbound links of a website using php?

how to find the total no.of inbound and outbound links of a website using php?
To count outbound links
parse html for webpage
parse all links using regex
filter links which starts with your domain or "/"
To inbound link
Grab google results page
http://www.google.ca/search?sourceid=chrome&ie=UTF-8&q=site:
parse similarly
For outbound links, you will have to parse the HTML code of the website as some here have suggested.
For inbound links, I suggest using the Google Custom Search API, sending a direct request to google can get your ip banned. You can view the search api here. Here is a function I use in my code for this api:
function doGoogleSearch($searchTerm)
{
$referer = 'http://your-site.com';
$args['q'] = $searchTerm;
$endpoint = 'web';
$url = "http://ajax.googleapis.com/ajax/services/search/".$endpoint;
$args['v'] = '1.0';
$key= 'your-api-key';
$url .= '?'.http_build_query($args, '', '&');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $referer);
$body = curl_exec($ch);
curl_close($ch);
//decode and return the response
return json_decode($body);
}
After calling this function as: $result = doGoogleSearch('link:site.com'), the variable $result->cursor->estimatedResultCount will have the number of results returned.
PHP can't determine the inbound links of a page through some trivial action. You either have to monitor all incoming visitors and check what their referrer is, or parse the entire internet for links that point to that site. The first method will miss links not getting used, and the second method is best left to Google.
On the other hand, the outbound links from a site is doable. You can read in a page and analyze the text for links with a regular expression, counting up the total.
function getGoogleLinks($host)
{
$request = "http://www.google.com/search?q=" . urlencode("link:" . $host) ."&hl=en";
$data = getPageData($request);
preg_match('/<div id=resultStats>(About )?([\d,]+) result/si', $data, $l);
$value = ($l[2]) ? $l[2] : "n/a";
$string = "" . $value . "";
return $string;
}
//$host means the domain name

Categories