Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I want to create a function that takes a URL parameter and a username parameter that goes to that url, enters the username and if the page says that the username doesn't exist(error) it will return false, otherwise it will return true. So I tried and I did something a little like this, but it didnt work.
function ($url, $username) {
$main = file_get_contents($url.$username);
if (#$main) { return false; }
else { return true; }
}
So if you have any ideas on how to make this idea actually work please help me
$site = 'https://twitter.com/';
$username = 'SteveMartinToGo';
$url = $site.$username;
function urlExists($url=NULL) {
if($url == NULL) return false;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode >= 200 && $httpcode < 400){
return true;
} else {
return false;
}
}
if(urlExists($url)) {
echo 'url exists';
} else {
echo 'url does not exist';
}
The problem with this is that some sites (like Facebook) will return a 200 instead of a 404, so the URL will show as existing even though it's not. Also, I got this function from someplace else (can't remember where though) so I don't want to take credit for that code. hope it helps...
Edit: updated because of fred-ii's eagle eye and suggestions. :)
It is important to have the actual structure of the url. You can make a print and view the source code (HTML) if there is no special characters in the URL can be masked.
Related
I'm trying to create a simple script that'll let me know if a website is based off WordPress.
The idea is to check whether I'm getting a 404 from a URL when trying to access its wp-admin like so:
https://www.audi.co.il/wp-admin (which returns "true" because it exists)
When I try to input a URL that does not exist, like "https://www.audi.co.il/wp-blablabla", PHP still returns "true", even though Chrome, when pasting this link to its address bar returns 404 on the network tab.
Why is it so and how can it be fixed?
This is the code (based on another user's answer):
<?php
$file = 'https://www.audi.co.il/wp-blabla';
$file_headers = #get_headers($file);
if(!$file_headers || strpos($file_headers[0], '404 Not Found')) {
$exists = "false";
}
else {
$exists = "true";
}
echo $exists;
You can try to find the wp-admin page and if it is not there then there's a good change it's not WordPress.
function isWordPress($url)
{
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER , 1 );
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
// grab URL and pass it to the browser
curl_exec($ch);
$httpStatus = curl_getinfo($ch, CURLINFO_RESPONSE_CODE);
// close cURL resource, and free up system resources
curl_close($ch);
if ( $httpStatus == 200 ) {
return true;
}
return false;
}
if ( isWordPress("http://www.example.com/wp-admin") ) {
// This is WordPress
} else {
// Not WordPress
}
This may not be one hundred percent accurate as some WordPress installations protect the wp-admin URL.
I'm probably late to the party but another way you can easily determine a WordPress site is by crawling the /wp-json. If you're using Guzzle by PHP, you can do this:
function isWorpress($url) {
try {
$http = new \GuzzleHttp\Client();
$response = $http->get(rtrim($url, "/")."/wp-json");
$contents = json_decode($response->getBody()->getContents());
if($contents) {
return true;
}
} catch (\Exception $exception) {
//...
}
return false;
}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to find website email address(like name#example.com) from website url link. so, is it possible to find email address from website url ? if yes then, please share how to implement. Language is not necessary.
as per my view,
if we read content from website url using CURL, and
find email address from them using regular expression.
is it possible ?
find bellow code for read page content from website url using CURL:
<?php
$url = 'yoururl';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
$data = curl_exec($curl);
curl_close($curl);
after then find email address using regex expression from $data string.
Is it possible ?
Technically you could get an email address from a domain by querying the public WHOIS information (which could be done by querying an API) but the email address publicised are rarely a companies true email address but rather reporting mailboxes for spam or technical requests.
http://network-tools.com/default.asp?prog=network&host=www.google.com
Some example code of how it could be done returning JSON output:
<?php
function getIP() {
if (!empty($_SERVER['HTTP_CLIENT_IP'])) {
$ip = $_SERVER['HTTP_CLIENT_IP'];
} elseif (!empty($_SERVER['HTTP_X_FORWARDED_FOR'])) {
$ip = $_SERVER['HTTP_X_FORWARDED_FOR'];
} else {
$ip = $_SERVER['REMOTE_ADDR'];
}
return chkIP($ip);
}
function chkIP($ip) {
$dirtydomain = gethostbyaddr($ip);
preg_match("/((\w*)\.+(\w{2})\.+(\w{2})$)|((\w*)\.+(\w{3})$)/", $dirtydomain, $output_array);
$cmd = 'whois ' . $output_array[0];
$data = shell_exec($cmd);
return getEmail($data,$output_array[0]);
}
function getEmail($data,$domain) {
$array = preg_split('/( )|(\n)/',$data); //DATA from WHOIS
foreach ($array as $value) {
if (strpos($value, '#') == TRUE) {
$emailArray[] = $value;
}
}
return outputArray($emailArray,$domain);
}
function outputArray($emailArray, $domain) {
if (count($emailArray) < 1) {
return json_encode("No Email Address Found for " . $domain);
} else {
return json_encode($emailArray);
}
}
getIP(); //Will Return JSON Output
?>
An easy regexp on the top of my head.
preg_match_all("/([a-z0-9\.]{1,50}#[a-z0-9]{1,50}\.[a-z]{1,5})/ims",$data,$matches)
I have a curl based function in php to check if a website is online that works fine however i have noticed that it only works for http links and not https links.
Does anyone know how this function can be updated to support https links also.
function isDomainAvailible($domain){
// Check, if a valid url is provided
if(!filter_var($domain, FILTER_VALIDATE_URL)){
return false;
}
// Initialize curl
$curlInit = curl_init($domain);
curl_setopt($curlInit,CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($curlInit,CURLOPT_HEADER,true);
curl_setopt($curlInit,CURLOPT_NOBODY,true);
curl_setopt($curlInit,CURLOPT_RETURNTRANSFER,true);
// Get answer
$response = curl_exec($curlInit);
curl_close($curlInit);
if ($response){
return true;
} else {
return false;
}
}
If you want to use your function and it u need https support then . you can download this file cacert.pem and add the path in your function like:
function isDomainAvailible($domain){
// Check, if a valid url is provided
if(!filter_var($domain, FILTER_VALIDATE_URL)){
return false;
}
// Initialize curl
$curlInit = curl_init($domain);
curl_setopt($curlInit,CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($curlInit,CURLOPT_HEADER,true);
curl_setopt($curlInit,CURLOPT_NOBODY,true);
curl_setopt($curlInit,CURLOPT_RETURNTRANSFER,true);
curl_setopt($curlInit, CURLOPT_PROTOCOLS, CURLPROTO_HTTPS);
curl_setopt($curlInit, CURLOPT_PROTOCOLS, CURLPROTO_HTTP);
curl_setopt ($curlInit, CURLOPT_CAINFO, dirname(__FILE__)."/cacert.pem");
// Get answer
$response = curl_exec($curlInit);
curl_close($curlInit);
if ($response){
return 1;
} else {
return 0;
}
}
$domain = 'https://www.facebook.com';
$avi = isDomainAvailible($domain);
echo $avi;
try to add this in curl function
curl_setopt($curlInit, CURLOPT_SSL_VERIFYPEER, false);
When you say, "i have noticed that it only works for http links and not https links", what exactly do you mean? cURL works the same either way, you just pass it a url with the https protocol instead of http. The problem might be something else. Does this sound like what you're having trouble with? How to send HTTPS posts using php
I'm working on a little webcrawler as a side project at the moment and basically having it collect all hrefs on a page and then subsequently parsing those, my problem is.
How can I only get the actual page results? at the moment i'm using the following
foreach($page->getElementsByTagName('a') as $link)
{
$compare_url = parse_url($link->getAttribute('href'));
if (#$compare_url['host'] == "")
{
$links[] = 'http://'.#$base_url['host'].'/'.$link->getAttribute('href');
}
elseif ( #$base_url['host'] == #$compare_url['host'] )
{
$links[] = $link->getAttribute('href');
}
}
As you can see this will bring in jpegs, exe files etc. I only need to pickup the web pages like .php, .html, .asp etc.
I'm not sure if there is some function able to work this one out or if it will need to be regex from some sort of master list?
Thanks
Since the URL string alone doesn't connected with the resource behind it in any way you will have to go out and ask the webserver about them. For this there's a HTTP method called HEAD so you won't have to download everything.
You can implement this with curl in php like this:
function is_html($url) {
function curl_head($url) {
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_MAXREDIRS, 5);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HTTP_VERSION , CURL_HTTP_VERSION_1_1);
$content = curl_exec($curl);
curl_close($curl);
// redirected heads just pile up one after another
$parts = explode("\r\n\r\n", trim($content));
// return only the last one
return end($parts);
}
$header = curl_head('http://github.com');
// look for the content-type part of the header response
return preg_match('/content-type\s*:\s*text\/html/i', $header);
}
var_dump(is_html('http://github.com'));
This version is only accepts text/html responses and doesn't check if the response is 404 or other error (however follows redirects up to 5 jumps). You can tweak the regexp or add some error handling in either from the curl response, or by matching against the header string's first line.
Note: Webservers will run scripts behind these URLs to give you responses. Be careful not overload hosts with probing, or grabbing "delete" or "unsubscribe" type links.
To check if a page is valid (html,php... extension use this function:
function check($url){
$extensions=array("php","html"); //Add extensions here
foreach($extensions as $ext){
if(substr($url,-(strlen($ext)+1))==".".$ext){
return 1;
}
}
return 0;
}
foreach($page->getElementsByTagName('a') as $link) {
$compare_url = parse_url($link->getAttribute('href'));
if (#$compare_url['host'] == "") { if(check($link->getAttribute('href'))){ $links[] = 'http://'.#$base_url['host'].'/'.$link->getAttribute('href');} }
elseif ( #$base_url['host'] == #$compare_url['host'] ) {
if(check($link->getAttribute('href'))){ $links[] = $link->getAttribute('href'); }
}
Consider using preg_match to check the type of the link (application , picture , html file) and considering the results decide what to do.
Another option (and simple) is to use explode and find the last string of the url which comes after a . (the extension)
For instance:
//If the URL will has any one of the following extensions , ignore them.
$forbid_ext = array('jpg','gif','exe');
foreach($page->getElementsByTagName('a') as $link) {
$compare_url = parse_url($link->getAttribute('href'));
if (#$compare_url['host'] == "")
{
if(check_link_type($link->getAttribute('href')))
$links[] = 'http://'.#$base_url['host'].'/'.$link->getAttribute('href');
}
elseif ( #$base_url['host'] == #$compare_url['host'] )
{
if(check_link_type($link->getAttribute('href')))
$links[] = $link->getAttribute('href');
}
}
function check_link_type($url)
{
global $forbid_ext;
$ext = end(explode("." , $url));
if(in_array($ext , $forbid_ext))
return false;
return true;
}
UPDATE (instead of checking 'forbidden' extensions , let's look for good ones)
$good_ext = array('html','php','asp');
function check_link_type($url)
{
global $good_ext;
$ext = end(explode("." , $url));
if($ext == "" || !in_array($ext , $good_ext))
return true;
return false;
}
I'm starting to help a friend who runs a website with small bits of coding work, and all the code required will be PHP. I am a C# developer, so this will be a new direction.
My first stand-alone task is as follows:
The website is informed of a new species of fish. The scientific name is entered into, say, two input controls, one for the genus (X) and another for the species (Y). These names will need to be sent to a website in the format:
http://www.fishbase.org/Summary/speciesSummary.php?genusname=X&speciesname=Y&lang=English
Once on the resulting page, there are further links for common names and synonyms.
What I would like to be able to do is to find these links, and call the URL (as this will contain all the necessary parameters to get the particular data) and store some of it.
I want to save data from both calls and, once completed, convert it all into xml which can then be uploaded to the website's database.
All I'd like to know is (a) can this be done, and (b) how difficult is it?
Thanks in advance
Martin
If I understand you correctly you want your script to download a page and process the downloaded data. If so, the answers are:
a) yes
b) not difficult
:)
Oke... here some more information: I would use the CURL extension, see:
http://php.net/manual/en/book.curl.php
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
?>
I used a thing called snoopy (http://sourceforge.net/projects/snoopy/) 4 years a go.
I took about 500 customers profiles from a website that published them in a few hours.
a) Yes
b) Not difficult when have experience.
Google for CURL first, or allow_url_fopen.
file_get_contents() will do the job:
$data = file_get_contents('http://www.fishbase.org/Summary/speciesSummary.php?genusname=X&speciesname=Y&lang=English');
// Отправить URL-адрес
function send_url($url, $type = false, $debug = false) { // $type = 'json' or 'xml'
$result = '';
if (function_exists('curl_init')) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
} else {
if (($content = #file_get_contents($url)) !== false) $result = $content;
}
if ($type == 'json') {
$result = json_decode($result, true);
} elseif ($type == 'xml') {
if (($xml = #simplexml_load_file($result)) !== false) $result = $xml;
}
if ($debug) echo '<pre>' . print_r($result, true) . '</pre>';
return $result;
}
$data = send_url('http://ip-api.com/json/212.76.17.140', 'json', true);