I am trying to make a script to check if a webpage has a back link to my page. I have found this script but the problem is that it returns the error message "No back link found" even if there is a backlink. Could someone tell me what is wrong with this script?
Here is the script I am using:
require('simple_html_dom.php');
function CheckReciprocal( $targetUrl, $checkLinkUrl, $checkNofollow = true )
{
$html = file_get_html($targetUrl);
if (empty($html))
{
//# Could not load file
return false;
}
$link = $html->find('a[href^='.$checkLinkUrl.']',0);
if (empty($link))
{
//# Link not found
return false;
}
if ( $checkNofollow && $link->hasAttribute('rel') )
{
$attr = $link->getAttribute('rel');
return (preg_match("/\bnofollow\b/is", $attr) ? false : true);
}
return true;
}
$targetUrl = 'http://example.com/test.html';
$checkLinkUrl = 'http://mysite.com';
if ( CheckReciprocal($test, $checkLinkUrl) )
{
echo 'Link found';
}
else { echo 'Link not found or marked as nofollow'; }
Thank you!
I don't know how does that simple_html_dom.php's $html->find() works cos never used it, but it seems that your problem is there. I would trust the good ol' DOMDocument + regex.
Just wrote a function and tested it, just use on the $url the plain domain + whatever you want, don't worry about the http(s) or www and stuff like that:
function checkBackLink($link, $url, $checkNoFollow = true){
$dom = new DOMDocument();
$dom->loadHTMLFile($link);
foreach($dom->getElementsByTagName('a') as $item){
if($checkNoFollow){
if(preg_match('/nofollow/is', $item->getAttribute('rel'))) continue;
}
if($item->hasAttribute('href') === false) continue;
if(preg_match("#^(https?\://)?(www\.)?$url.*#i", $item->getAttribute('href'))) return true;
}
}
if(checkBacklink('the link', 'example.com')){
echo "link found";
} else {
echo "Link not found or marked as nofollow";
}
If you don't like it and still want to use the simple_html_dom just make sure how that find() works, because if it only match exact values that could be troublesome.
Related
I'm trying to make a php script that would make a loop that would get the contents of my site/server and if the text response is for example "false" then it would do the same thing, basically will loop until the site's text response will echo "true".
This is what i tried:
$getcontents = file_get_contents("http://example.com/script.php"); // it will echo false
if (strpos($getcontents , 'false')) {
$getcontents = file_get_contents("http://example.com/script.php");
else if (strpos($getcontents , 'false')) {
$getcontents = file_get_contents("http://example.com/script.php");
}
else if (strpos($getcontents , 'true')) {
echo "finished".;
}
I'm not sure if this is the right way or even if it is possible and i apologize in advance if i did not explain myself very well. Thank you for attention!
You could use a regular while loop.
$getcontents = 'false'; //set value to allow loop to start
while(strpos($getcontents , 'false') !== false) {
$getcontents = file_get_contents("http://example.com/script.php");
}
echo "finished";
This will loop until $getcontents does not contain false.
You could also use a recursive function like this.
function check_for_false() {
$getcontents = file_get_contents("http://example.com/script.php");
if(strpos($getcontents , 'false') !== false) {
check_for_false();
} else if(strpos($getcontents , 'true') !== false) {
echo "finished";
} else {
echo "response didn't contain \"true\" or \"false\"";
}
}
This function should keep calling itself until $getcontents contains the word true, and does not contain false.
How to add statement, when I search and it doesnt exist on the url, it will show nothing.html?
$url1 = "http://www.pengadaan.net/tend_src_cont2.php?src_nm=";
$url2 = $_GET['src_nm']."&src_prop=";
$url3 = $_GET['src_prop'];
$url = $url1.$url2.$url3;
$html = file_get_html($url);
if (method_exists($html,"find")) {
echo "<ul>";
foreach($html->find('div[class=pengadaan-item] h1[] a[]') as $element ) {
echo ("<li>".$element."</li>");
}
echo "</ul>";
echo $url;
}
else {
}
There are two ways to move to another page in PHP. you can do header("Location: http://www.yourwebsite.com/nothing.php"); or you can have PHP echo JavaScript to do a reidrect (if you already defined your headers):
if (method_exists($html,"find")) { // If 'find exist'
...
} else { // Otherwise it does not exist
header("Location: http://www.pengadaan.net/nothing.php"); // redirect here
}
Or if you already sent you headers you can get around it using JavaScript:
...
} else {
echo '<script>window.location.replace("http://www.pengadaan.net/nothing.php")</script>';
}
I am using PHP DOMNode hasAttributes to retrieve all of the elements' attributes. So far, everything is working great. My code is below. However, this line else if($imageTags->hasAttributes() == false) is where I can't get it to work, it displays error on my page instead redirecting the user to index php when code failed to work. What I really want is if ($imageTags->hasAttributes() == true) does not equal TRUE. Redirect the user to index.php and don't display the errors instead.
function get_iframe_attr($string){
$doc = new DOMDocument();
$doc->loadHTML("$string");
$imageTags = $doc->getElementsByTagName('iframe')->item(0);
if ($imageTags->hasAttributes() == true) {
foreach ($imageTags->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
$attr_val[] = "$name=\"$value\" ";
}
echo $implode_str = implode(" ", $attr_val);
}else if($imageTags->hasAttributes() == false){
header("Location: index.php");
}
}
get_iframe_attr('<iframe src="" scrolling="no" width="600" height="438" marginheight="0" marginwidth="0" frameborder="0"></iframe>');
Note: if you remove the '<iframe' tags on the string. You will get the error
...
} else if($imageTags->hasAttributes() == false){
header("Location: index.php");
die; // needed here to prevent code below, if any from executing
}
...
also on your edit note about removing iframe and getting error, you need to check if $doc->getElementsByTagName('iframe') has actually grabbed anything, example:
$imageTags = $doc->getElementsByTagName('iframe')
if ($imageTags->length > 0) {
$imageTags = $imageTags->item(0);
...
Instead of using $imageTags->hasAttributes() I solved it by using $imageTags instanceof DOMNode
Here we go again, I need some help with this.
the preg_match is not working as I want it, it is not validating any of site links. I need a 2nd pair of eyes to help me see what is wrong with my code.
if (!empty($_POST["url"]))
{
if (filter_var($_POST["url"], FILTER_VALIDATE_URL))
{
if (!preg_match('/^http(s)?:\/\/(?:[a-z\d-]+\.)*mysite.com(?:(?=\/)|$)/i', $url))
{
echo "<strong>Error</strong>: Not a valid Mysite.com link or could shorten link";
} else {
$result = $sql->query("SELECT `id` FROM `shortcuts` WHERE `url`='{$_POST["url"]}'");
$id = $result[0]["id"];
if (empty($id))
{
$result = $sql->query("INSERT INTO `shortcuts` (`url`) VALUES ('{$_POST["url"]}')");
if ($result)
{
$id = $sql->get_increment();
if (empty($id))
{
echo "FAILED ENCODE";
exit(1);
}
}
$shorturl = "http://mysite.com/".encode($id);
}
}
}
}
Since you've already validated the URL, why not just use parse_url() on it to extract the host name?
if (false === stristr(parse_url($_POST['url'], PHP_URL_HOST), 'mysite.com')) {
// not valid url
}
Or, if 'mysite.com` must be the last bit of the hostname:
if (0 !== substr_compare(parse_url($_POST['url'], PHP_URL_HOST), 'mysite.com', -10, 10, true)) {
// invalid url
}
Make yourself the code less complicated, so it is easier to debug. You are concerned about a specific problem here if I understood you right:
if (matches_shorturl_pattern($url) {
// go on ...
}
Create yourself a function. Good thing about is, you can test it isolated without submitting something or hitting the database.
/**
* is URL a valid shorturl by syntax?
*
* #param string $url
* #return bool
*/
function matches_shorturl_pattern($url) {
$pattern = '/^http(s)?:\/\/(?:[a-z\d-]+\.)*mysite.com(?:(?=\/)|$)/i';
$result = preg_match($pattern, $url);
if ($result === FALSE) {
throw new Exception('The regex did a poop.');
}
return (bool) $result;
}
It's much easier to test this function alone with the inputs you want to test it again. Also you're properly checking for error conditions then. Good luck.
I found it!,
if (!preg_match('/^http(s)?:\/\/(?:[a-z\d-]+\.)*mysite.com(?:(?=\/)|$)/i', **$url**))
new
if (!preg_match('/^http(s)?:\/\/(?:[a-z\d-]+\.)*mysite.com(?:(?=\/)|$)/i', **$_POST["url"]**))
I need help crafting a PHP code that accomplishes the following:
Access a website (www.example.com)
Download its source code into a string variable
Search this specific string for a specific content such as
<div class="news" title="news alert">Click to get news alert</div>
Basically I need to search the source code for title="news alert"
Thank you all,
You could use PHP DOM:
$text = file_get_contents('http://example.com/path/to/file.html');
$doc = new DOMDocument('1.0');
$doc->loadHTML($text);
foreach($doc->getElementsByTagName('div') AS $div) {
$class = $div->getAttribute('class');
if(strpos($class, 'news') !== FALSE) {
if($div->getAttribute('title') == 'news alert') {
echo 'title found';
}
else {
echo 'title not found';
}
}
}
Or perhaps Query Path which is tries to emulate jQuery server side:
$text = file_get_contents('http://example.com/path/to/file.html');
if(qp($text)->find('div.news[title="news alert"]')->is('*')) {
echo('title found');
}
else {
echo('title found');
}
You could use DOMXPath to find it:
$dcmnt = new DOMDocument(); $dcmnt->loadHTML( $cntnt );
$xpath = new DOMXPath( $dcmnt );
$match = $xpath->query("//div[#title='news alert']");
echo $match->length ? "Found" : "Not Found" ;
Demo: http://codepad.org/CLdE8XCQ
That's pretty simple:
$html = file_get_contents('http://site.com/page.html');
if (strpos($html,'title="news alert"')!==false)
echo 'title found';
$page = file_get_contents('http://www.example.com/');
if(strpos($page, "title=\"news alert\"")!==false){
echo 'title found';
}
$url = 'http://www.example.com/';
$page = file_get_contents($url);
if(strpos($page, 'title="news alert"') !==false || strpos($page, 'title=\'news alert\'') !==false)
{
echo 'website with news alert found';
}
else
{
echo 'website not found';
}