check external links invalid in a web site - php

i wrote a script on php in order to check external links invalid in a web site
this is the sript
<?php
// It may take a whils to spider a website ...
set_time_limit(10000);
// Inculde the phpcrawl-mainclass
include_once('../PHPCrawl_083/PHPCrawl_083/libs/PHPCrawler.class.php');
include ('check.php');
// Extend the class and override the handleDocumentInfo()-method
class MyCrawler extends PHPCrawler
{
function handleDocumentInfo(PHPCrawlerDocumentInfo $DocInfo) {
if (PHP_SAPI == "cli") $lb = "\n";
else {
$lb = "<br />";
// Print the URL and the HTTP-status-Code
// Print the refering URL
$file = file_get_contents($DocInfo->url);
preg_match_all('/<a[^>]+href="([^"]+)/i', $file, $urls);
echo '<br/>';
$home_url = parse_url( $_SERVER['HTTP_HOST'] );
foreach($urls as $url){
for($i=0;$i<sizeof($url);$i++){
$link_url = parse_url( $url[$i] );
if( $link_url['host'] != $home_url['host'] ) {
if (check_url($url[$i])=== false){
echo " Page requested: ".$DocInfo->url." (".$DocInfo->http_status_code.")".$lb;
echo '<br/>';
echo "<font color=green >"."lien externe invalide :".$url[$i].$lb." </font>";
echo '<br/>';
}
}
}
}
}
}
}
$crawler = new MyCrawler();
$crawler->setURL("http://www.tunisie-web.org ");
$crawler->addURLFilterRule("#\.(jpg|gif|png|pdf|jpeg|css|js)$# i");
$crawler->setWorkingDirectory("C:/Users/mayss/Documents/travailcrawl/");
$crawler->go();
?>
but more than external links(not invalid :/) it gives "http://www.tunisie-web.org" as an axternal link and i don't know where is the problem !!
please help
and this is check.php :
<?php
function check_url($url) {
if ( !filter_var($url, FILTER_VALIDATE_URL,FILTER_FLAG_QUERY_REQUIRED) === false) {
return true ;
}
else {
return false;
}
}
?>

Related

Update XML element text with PHP

I'm trying to update XML element text based upon a form submission. It is a userdatabase and im using the user's password as a reference to update their user id. The passwords all all unique so I thought it would be an easy element to reference. However whenever I attempt to edit a UID it fails and sends me to my error page I created if the function fails. Im not sure where I went wrong any assistance would be great.
Update UID Function
function updateUID($pass, $file, $new)
{
$xml = new DOMDocument();
$xml->load($file);
$record = $xml->getElementsByTagName('UniqueLogin');
foreach ($record as $person) {
$password_id = $person->getElementsByTagName('Password')->item(0)->nodeValue;
//$person_name=$person->getElementsByTagName('name')->item(0)->nodeValue;
if ($password_id == $password) {
$id_matched = true;
$updated = $xml->createTextNode($new);
$person->parentNode->replaceChild($person, $updated);
break;
}
}
if ($id_matched == true) {
if ($xml->save($file)) {
return true;
}
}
}
Code that calls the function
session_start();
include_once "includes/functions.inc.php";
include_once "includes/jdbh.inc.php";
include_once "includes/dbh.inc.php";
include_once "includes/ftpconn2.inc.php";
$file = $_SESSION['fileNameXML'];
if (file_exists($file)) {
if (isset($_POST['submit'])) {
$pass = $_POST['id'];
//$uid = $_SESSION['userid'];
$new = $_POST['uid'];
//$entry = getUsername($jconn, $uid)." deleted a server ban for".$name;
//if (isset($_GET['confirm'])) {
if (updateUID($pass, $file, $new)) {
//createLogEntry($conn, $uid, $entry);
if (1 < 2) { //This is intentional to get around the $message varible below that is not required.
$message = $affectedRow . " records inserted";
try {
$ftp_connection = ftp_connect($ftp_server);
if (false === $ftp_connection) {
throw new Exception("Unable to connect");
}
$loggedIn = ftp_login($ftp_connection, $ftp_user, $ftp_password);
if (true === $loggedIn) {
//echo "Success!";
} else {
throw new Exception('unable to log in');
}
$local_file1 = "HostSecurity.xml";
$remote_file1 = "HostSecurity.xml";
if (ftp_put($ftp_connection, $local_file1, $remote_file1, FTP_BINARY)) {
//echo "Successfully written to $local_file\n";
} else {
echo "There was a problem";
}
ftp_close($ftp_connection);
header("location: ../serverPasswords.php");
}
catch (Exception $e) {
echo "Failure:" . $e->getMessage();
}
}
header("location: ../serverPasswords.php");
} else {
header("location: ../serverPasswords.php?e=UIDNPD");
}
} else {
echo "id missing";
}
} else {
echo "$file missing";
}
<Unique_Logins>
<UniqueLogin>
<UID>AA23GHRDS657FGGRSF126</UID>
<Password>iMs0Az2Zqh</Password>
</UniqueLogin>
<UniqueLogin>
<UID>AA23GSDGFHJKDS483FGGRSF126</UID>
<Password>Ab7wz77kM</Password>
</UniqueLogin>
</Unique_Logins>
I believe the issue was caused by the undeclared variable $password in the logic test and the fact that the function never returns an alternative value if things go wrong.
As per the comment regarding XPath - perhaps the following might be of interest.
<?php
$pass='xiMs0Az2Zqh';
$file='logins.xml';
$new='banana';
function updateUID( $pass=false, $file=false, $new=false ){
if( $pass & $file & $new ){
$dom = new DOMDocument();
$dom->load( $file );
# attempt to match the password with this XPath expression
$expr=sprintf( '//Unique_Logins/UniqueLogin/Password[ contains(.,"%s") ]', $pass );
$xp=new DOMXPath( $dom );
$col=$xp->query( $expr );
# We have a match, change the UID ( & return a Truthy value )
if( $col && $col->length===1 ){
$xp->query('UID', $col->item(0)->parentNode )->item(0)->nodeValue=$new;
return $dom->save( $file );
}
}
# otherwise return false
return false;
}
$res=updateUID( $pass, $file, $new );
if( $res ){
echo 'excellent';
}else{
echo 'bogus';
}
?>
I'm still not clear on exactly what's wrong, but if I understand you correctly, try making these changes in your code and see if it works:
#just some dummy values
$oldPass = "Ab7wz77kM";
$newUid = "whatever";
$record = $xml->getElementsByTagName('UniqueLogin');
foreach ($record as $person) {
$password_id = $person->getElementsByTagName('Password');
$user_id = $person->getElementsByTagName('UID');
if ($password_id[0]->nodeValue == $oldPass) {
$user_id[0]->nodeValue = $newUid;
}
}

Problem reading RSS feed programmatically using PHP

I have a newsfeed link from an Indian newspaper as follows:
https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml
I am trying to extract some information from it using PHP and simpleXML
$feedURL="https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml";
$array = get_headers($feedURL);
$statusCode = $array[0];
echo('<br>'.$statusCode.'<br>');
if (strpos($statusCode, "404")==FALSE) {
echo('Reading ' . $feedURL . '<br>');
$out = htmlspecialchars(file_get_contents($feedURL), ENT_QUOTES);
echo($out);
if (stripos($out, "<feed ") != FALSE) {
$feedType = 'ATOM';
$countATOM += 1;
} else if (stripos($out, "<rss") != FALSE) {
$feedType = 'RSS';
$countRSS += 1;
} else {
$feedType = 'UNREADABLE';
$countUNREADABLE += 1;
}
echo('<br>' . $feedType . '<br>');
echo('<br>-------------------------------------------------------------------------<br>');
if ($feedType == 'ATOM') {
$xmlOut = simplexml_load_string(file_get_contents($feedURL));
echo($xmlOut.'<br>-------------------------------------------------------------------------<br>');
if ($xmlOut === false) {
echo("Failed loading XML: ");
foreach (libxml_get_errors() as $error) {
echo ("<br>" . $error->message);
}
} else {
foreach ($xmlOut->entry as $entry) {
if (isset($xmlOut->entry->title) && isset($xmlOut->entry->link) && isset($xmlOut->entry->updated) && isset($xmlOut->entry->summary)){
$title=$xmlOut->entry->title;
$link=$title=$xmlOut->entry->link['href'];
$updated=$xmlOut->entry->updated;
$summary=$xmlOut->entry->summary;
if(isImportantNews($title) || isImportantNews($summary)){
$insertNewsCmd=$insertNewsCmd
."('".$link."',"
."'".stripSpecialChars($title)."',"
."'".setDate($updated)."'),";
}
}
echo($entry->updated . "<br>");
}
}
} elseif ($feedType == 'RSS') {
$xmlOut = simplexml_load_string(file_get_contents($feedURL));
print_r($xmlOut);
echo('<br>-------------------------------------------------------------------------<br>');
if ($xmlOut === false) {
echo("Failed loading XML: ");
foreach (libxml_get_errors() as $error) {
echo ("<br>" . $error->message);
}
} else {
foreach ($xmlOut->channel->item as $item) {
if (isset($item->title) && isset($item->link) && isset($item->description) && isset($item->pubDate)) {
$title = $item->title;
$link = $item->link;
$descr = $item->description;
$pubDate = $item->pubDate;
echo($title.'<br>'.$link.'<br>'.$descr.'<br>');
echo('<br>-------------------------------------------------------------------------<br>');
if(isImportantNews($title) || isImportantNews($descr)){
$insertNewsCmd=$insertNewsCmd
."('".$link."',"
."'".stripSpecialChars($title)."',"
."'".setDate($pubDate)."'),";
}
echo($entries->pubDate. "<br>");
}
}
}
} else {
continue;
}
break;
} else {
echo($feedURL . ' encountered problems being read...' . '<br>');
}
Basically what I am doing in the program is that I am using the above link (after determining if it is ATOM or RSS) to extract the news summary and description and determine if it is important news using the isImportantNews() method. If so, I store it in a database.
My problem is that if I open the above link in a browser directly, I can get to see the information without any issues but trying to read it using the above code returns a HTTP 403 Forbidden status code
Why is this happening and is there a way to get around this issue? Being able to open it directly tells me that the 403 maybe coming up due to programatic access attempt (?) But I am not certain about it. I also tried the following ways to read it with the same expected failure
echo('read file ####################################################################################################');
echo readfile("https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml"); //needs "Allow_url_include" enabled
echo('<br>include ####################################################################################################');
echo include("https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml"); //needs "Allow_url_include" enabled
echo('<br>file get contents ####################################################################################################');
echo file_get_contents("https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml");
echo('<br>stream get contents####################################################################################################');
echo stream_get_contents(fopen('https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml', "r")); //you may use "r" instead of "rb" //needs "Allow_url_fopen" enabled
echo('<br>get remote data ####################################################################################################');
echo get_remote_data('https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml');
$feedURL = "https://www.hindustantimes.com/rss/cities/delhi/rssfeed.xml";
$out = htmlspecialchars(file_get_contents($feedURL), ENT_QUOTES);
echo($out);
Any help or insight would be most appreciated.

Preventing str_replace from commenting the php code that comes in the string

So I'm trying to make a website, and I'm using a php file as the index (index.php) and pretty much as the page that controls the whole website.
Since it recieves all the requests and returns the pages using str_replace, it's all working as it should (as in, it's making the web template work as it should) but the problem is I can't have php code inside the files that are part of the template, only in index.php.
So my question is, is there any way to prevent str_replace from turning the php code into comments?
Index.php:
<?php
//dirs
$pagesDir = "pages/";
$templatesDir = "templates/";
$errorsDir = "errors/";
if (isset($_REQUEST['page'])) {
if ($_REQUEST['page'] != "")
if (file_exists($pagesDir . $_REQUEST['page'] . ".html"))
$page_content = file_get_contents($pagesDir . $_REQUEST['page'] . ".html");
else
if (file_exists($_REQUEST['page'] . ".html"))
$page_content = file_get_contents($_REQUEST['pages'] . ".html");
else
echo "<h1>Page:" . $_REQUEST['page'] . " does not exist! Please check the url and try again!</h1>";
} else {
$page_content = file_get_contents($pagesDir . "home.html");
}
//PLACEHOLDER REPLACEMENT
$page_content = str_replace("!!HEAD!!", file_get_contents($templatesDir . "head.html"), $page_content);
$page_content = str_replace("!!BODY!!", file_get_contents($templatesDir . "body.html"), $page_content);
$page_content = str_replace("!!FOOT!!", file_get_contents($templatesDir . "eofScripts.html"), $page_content);
//RETURN THE CONTENT OF THE PAGE
echo $page_content;
New dispatcher after changes(this one works):
<?php
$templatesDir = "templates/";
$pagesDir = "pages/";
$loggedPagesDir = "templates/logged";
$pageExists = false;
$pageContent = null;
require_once('scripts/php/db_conn.php');
if (isset($_REQUEST['page'])) {
$page = $_REQUEST['page'] . ".php";
}
if (isset($_SESSION['redirect_reason'])) {
$dialogs->alertDialog("warningDialog", $_SESSION['redirect_reason']);
unset($_SESSION['redirect_reason']);
}
if (isset($_SESSION['user_action'])) {
$dialogs->alertDialog("infoDialog", $_SESSION['user_action']);
unset($_SESSION['user_action']);
}
if ($user->is_logged()) { //Only runs beyond this point if user is logged, if not, it will run the other one.
if (isset($_POST['logout_btn'])) {
$user->logout();
$user->redirect("pageDispatcher.php");
}
if (isset($page)) {
if ($page != "") {
if (file_exists($pagesDir . $page)) {
$pageExists = true;
$pageContent = ($pagesDir . $page);
} else {
echo "<h1>Page: " . $page . "does not exist! Please check the url and try again</h1>";
}
} else {
$pageExists = true;
$pageContent = ($pagesDir . "loggedhome.php");
}
} else {
$pageExists = true;
$pageContent = ($pagesDir . "loggedhome.php");
}
} else { //Only runs beyond this point if user isn't logged.
if (isset($_POST['login_btn'])) {
if ($user->login($_POST['email'], $_POST['password']) == false) {
$dialogs->loginFailed();
} else {
$_SESSION['user_action'] = "Welcome back " . $_SESSION['user_name'];
$user->redirect("pageDispatcher.php");
}
}
if (isset($page)) {
if ($page != "") {
if (file_exists($pagesDir . $page)) {
$pageExists = true;
$pageContent = ($pagesDir . $page);
} else {
echo "<h1>Page: " . $page . " does not exist! Please check the url and try again!</h1>";
}
} else {
$pageExists = true;
$pageContent = ($pagesDir . "home.php");
}
} else {
$pageExists = true;
$pageContent = ($pagesDir . "home.php");
}
}
?>
<html>
<?php include($templatesDir . "head.html"); ?>
<body>
<?php
if ($user->is_logged()) {
include($templatesDir . "loggedBody.html");
} else {
include($templatesDir . "body.html");
}
include($pageContent);
?>
</body>
</html>
NOTE: Do not use this method unless it's for learning purposes, its bad, can turn out to be quite hard to maintain, and probably will end up being slow since I have so many server side methods of things that I can do client side.
You read the content of page and echo it! Don't do that. Use include('file.html') instead. Just for sake of explanation, (if you have to) do sth like this:
$pages=['head.html','body.html','eofScripts.html'];
$page=$_REQUEST['page'];
if(in_array($page,$pages)) include($page);
else echo "<h1>Page: $page does not exist!</h1>";
But generally this is bad programming practice. As suggested in comments before do use a template engine.

how to use elseif statement in dom php html find

How to add statement, when I search and it doesnt exist on the url, it will show nothing.html?
$url1 = "http://www.pengadaan.net/tend_src_cont2.php?src_nm=";
$url2 = $_GET['src_nm']."&src_prop=";
$url3 = $_GET['src_prop'];
$url = $url1.$url2.$url3;
$html = file_get_html($url);
if (method_exists($html,"find")) {
echo "<ul>";
foreach($html->find('div[class=pengadaan-item] h1[] a[]') as $element ) {
echo ("<li>".$element."</li>");
}
echo "</ul>";
echo $url;
}
else {
}
There are two ways to move to another page in PHP. you can do header("Location: http://www.yourwebsite.com/nothing.php"); or you can have PHP echo JavaScript to do a reidrect (if you already defined your headers):
if (method_exists($html,"find")) { // If 'find exist'
...
} else { // Otherwise it does not exist
header("Location: http://www.pengadaan.net/nothing.php"); // redirect here
}
Or if you already sent you headers you can get around it using JavaScript:
...
} else {
echo '<script>window.location.replace("http://www.pengadaan.net/nothing.php")</script>';
}

error returning information using php

I am steps away from finishing this project, I seem to have a problem with my logic in my fetchMessage() function, I am passing in the session_id however I am getting nothing returned here is my function code
function fetchMessages($session)
{
$get = ("SELECT * FROM chatRoom WHERE session_id = '$session'");
$hold = mysql_query($get, $con);
if($hold)
{
return mysql_fetch_array($hold);
}
}
the code calling this function is
<?php
include 'core/conection.php';
include 'function.php';
if(isset($_POST['method']) === true && empty($_POST['method']) === false)
{
$method = trim($_POST['method']);
$session = trim($_POST['session']);
if($method === 'fetch')
{
$messages = fetchMessages($session);
if(empty($messages) === true)
{
echo 'A representative will be with you shortly';
echo '<br />';
echo $session;
}else
{
foreach($messages as $message)
{
$ts = $message['timestamp'];
?>
<div class = "message">
<?php echo date('n-j-Y h:i:s a', $ts); ?>
<?php echo $message['username']; ?>
says:<p><?php echo nl2br($message['message']); ?></p>
</div>
<?php
}
}
}
}
?>
I know it is making it at least this far because at this line
if(empty($messages) === true)
{
echo 'A representative will be with you shortly';
echo '<br />';
echo $session;
}
it displays the correct session_id i have a fealing it is either my fetchMessages function that is wrong or the html code to display the results that is wrong.

Categories