I want to have an application that displays all of my website's external links and outputs a diagram. Like for example www.example.com/articles/some-title.html is linked to my homepage.
Home
- www.example.com/some-text
- www.another-site.com/my-title
- www.example.com/articles/some-title.html Products
Products
- www.buy-now.com/product-reviews/231/098989
- www.sales.com/432/title-page.html Categories
- www.ezinearticles.com/blah-blah-blah
Something like SlickMap, but not on CSS.
I have setup a table on my DB so this will be dynamic and more links to come. I'm using CakePHP in working on this. Any ideas/suggestions?
Thanks for your time.
You can see slickmap, is a css implementation for site diagrams
http://astuteo.com/slickmap/
You can use PHP to retrieve the results from the database and you can use jQuery's treeView to display them.
Also, raphaël.js might be of interest, especially its diagram plugin, its fully customizable and should be something to check out.
If I am understanding you correctly, you want to parse the contents of an entire web site (HTML, JS, etc...), and create an array that contains all of your links, as well as the pages that they can be found on. If that is correct, this code will get the job done:
<?php
$path = "./path_to_your_files/";
$result = array();
if ( $handle = opendir($path) ) {
while (false !== ($file = readdir($handle))) {
if ($file != "." && $file != "..") {
$contents = file_get_contents($path . $file);
preg_match_all("/a[\s]+[^>]*?href[\s]?=[\s\"\']+"."(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/", $contents, $parts);
foreach ( $parts[1] as $link ) {
$result[$file][] = $link;
}
}
}
closedir($handle);
}
print_r($result);
?>
Related
Here´s my recent result for recursive listing of a user directory.
I use the results to build a filemanger (original screenshots).
(source: ddlab.de)
Sorry, the 654321.jpg is uploaded several times to different folders, thats why it looks a bit messy.
(source: ddlab.de)
Therefor I need two separate arrays, one for the directory tree, the other for the files.
Here only showing the php solution, as I am currently still working on javascript for usability. The array keys contain all currently needed infos. The key "tree" is used to get an ID for the folders as well as a CLASS for the files (using jquery, show files which are related to the active folder and hide which are not) a.s.o.
The folder list is an UL/LI, the files section is a sortable table which includes a "show all files"-function, where files are listed completely, sortable as well, with path info.
The function
function build_tree($dir,$deep=0,$tree='/',&$arr_folder=array(),&$arr_files=array()) {
$dir = rtrim($dir,'/').'/'; // not really necessary if 1st function call is clean
$handle = opendir($dir);
while ($file = readdir($handle))
{
if ($file != "." && $file != "..")
{
if (is_dir($dir.$file))
{
$deep++;
$tree_pre = $tree; // remember for reset
$tree = $tree.$file.'/'; // bulids something like "/","/sub1/","/sub1/sub2/"
$arr_folder[$tree] = array('tree'=>$tree,'deep'=>$deep,'file'=>$file);
build_tree($dir.$file,$deep,$tree,$arr_folder,$arr_files); // recursive function call
$tree = $tree_pre; // reset to go to upper levels
$deep--; // reset to go to upper levels
}
else
{
$arr_files[$file.'.'.$tree] = array('tree'=>$tree,'file'=>$file,'filesize'=>filesize($dir.$file),'filemtime'=>filemtime($dir.$file));
}
}
}
closedir($handle);
return array($arr_folder,$arr_files); //cannot return two separate arrays
}
Calling the function
$build_tree = build_tree($udir); // 1st function call, $udir is my user directory
Get the arrays separated
$arr_folder = $build_tree[0]; // separate the two arrays
$arr_files = $build_tree[1]; // separate the two arrays
see results
print_r($arr_folder);
print_r($arr_files);
It works like a charme,
Whoever might need something like this, be lucky with it.
I promise to post the entire code, when finished :-)
I am working on my 404 error doc, and I was thinking instead of just giving a sitemap, one could suggest to the user the website he might have looked for based on what actually exists on the server.
Example: if the person typed in "www.example.com/foldr/site.html", the 404 page could output:
Did you mean "www.example.com/folder/site.html"?
For this, I wrote the following code which works for me very well. My question now is: is it "safe" to use this? As basically someone could detect all files on the server by trying all kind of combinations. Or a hacker could even use a script that loops through and lists all types of valid URLs.
Should I limit the directories this script can detect and propose? With an array of "OK"-locations, or by file type?
Had anyone else already got an idea like this?
PHP:
// get incorrect URL that was entered
$script = explode("/",$_SERVER['SCRIPT_NAME']);
$query = $_SERVER['QUERY_STRING'];
// create vars
$match = array();
$matched = "../";
// loop through the given URL folder by folder to find the suggested location
foreach ($script as $dir) {
if (!$dir) {
continue;
}
if ($handle = opendir($matched)) {
while (false !== ($entry = readdir($handle))) {
if ($entry != "." && $entry != "..") {
similar_text($dir, $entry, $perc);
if ($perc > 80) {
$match[$entry] = $perc;
}
}
}
closedir($handle);
if ($match) {
arsort($match);
reset($match);
$matched .= key($match)."/";
} else {
$matched = false;
break;
}
$match = array();
}
}
// trim and echo the result that had the highest match
$matched = trim(ltrim(rtrim($matched,"/"),"."));
echo "proposed URL: ".$_SERVER["SERVER_NAME"].$matched;
Yup, you can see it as this:
Imagine a house with only glass walls on the outside, but it's night. You're a thief (hacker) and you want to check the house for worthfull loot (files with passwords, db connections etc).
If you don't protect (certain) files, you would be putting the lights on in every part of the house. The thief would look through the windows and see that you have loot - now the only the he would have to do is get in and take it.
If you do protect the files, the thief won't even be able to know that there was any loot in the house, and thus would the thief have a higher chance of moving on to the next house.
This is my code:
$ost=$_GET['id']; //get the ID from the URL
$path = "audio/soundtracks/$ost"; //use the ID to select a path
// Open the folder
$dir_handle = #opendir($path) or die("Unable to open $path");
// Loop through the files
while ($file = readdir($dir_handle)) {
if($file == "." || $file == ".." || $file == "index.php" )
continue;
echo "<a href='$path/$file'>$file</a><br />"; //return the name of the track
}
// Close
closedir($dir_handle);
It's purpose is to automatically list every sound track cointained in a directory, the name of which is given by the ID passed through the URL. Each track is named with the format "### - title.mp3", e.g. "101 - Overture.mp3".
It works fine, but the resulting list is sorted randomly for some reason. Is there any way to sort the tracks by title? Also, I'm pretty much a newbie with PHP, is there any security issue with the GET function? Thanks in advance.
EDIT: The GET is only used to specify the path, it's not supposed to interact with the database. Is this enough to prevent attacks?
$ost = $_GET['id'];
$bad = array("../","=","<", ">", "/","\"","`","~","'","$","%","#");
$ost = str_replace($bad, "", $ost);
$path = "audio/soundtracks/$ost";
Do some checks on GET parameter before using it. Like checking it is numeric, right lenght etc. And msyql_real_escape_String if used against db.
When looping directory, save files in array in php, with title as index. like this, then you can sort it as you please:
while ($file = readdir($dir_handle)) {
if($file == "." || $file == ".." || $file == "index.php" )
continue;
$array[$file] = "<a href='$path/$file'>$file</a><br />"; //return the name of the track
}
sort($array);
... after this, loop and print array separately.
It is a better coding practice to first loop to arrays, and then print separatly... in my eyes. It is more flexible.
Apart from checking length and using escape string security measures on the $_GET, you can also encode and decode the id into the URL and decode the before using it.
//before putting into URL
$id = $rows["id"];
$id = base64_encode($id);
<a href="yourUrl.php?id='$id'"
//in yourUrl.php
$id = $_GET['id'];
$id = base64_decode($id);
I have a double question. Part one: I've pulled a nice list of pdf files from a directory and have appended a file called download.php to the "href" link so the pdf files don't try to open as a web page (they do save/save as instead). Trouble is I need to order the pdf files/links by date created. I've tried lots of variations but nothing seems to work! Script below. I'd also like to get rid of the "." and ".." directory dots! Any ideas on how to achieve all of that. Individually, these problems have been solved before, but not with my appended download.php scenario :)
<?php
$dir="../uploads2"; // Directory where files are stored
if ($dir_list = opendir($dir))
{
while(($filename = readdir($dir_list)) !== false)
{
?>
<p><a href="http://www.duncton.org/download.php?file=login/uploads2/<?php echo $filename; ?>"><?php echo $filename;
?></a></p>
<?php
}
closedir($dir_list);
}
?>
While you can filter them out*, the . and .. handles always come first. So you could just cut them away. In particular if you use the simpler scandir() method:
foreach (array_slice(scandir($dir), 2) as $filename) {
One could also use glob("dir/*") which skips dotfiles implicitly. As it returns the full path sorting by ctime then becomes easier as well:
$files = glob("dir/*");
// make filename->ctime mapping
$files = array_combine($files, array_map("filectime", $files));
// sorts filename list
arsort($files);
$files = array_keys($files);
I currently have this script that automatically searches my directory and displays the results in iframes within a div:
<?php
$iterator = new RecursiveDirectoryIterator('work/');
foreach(new RecursiveIteratorIterator($iterator) as $filename => $cur) {
$file_info = pathinfo($filename);
if($file_info['extension'] === 'php') {
echo "<iframe width=420 height=150 frameborder=0 src='$filename'></iframe>";
}
}
?>
This works a treat, however if I want the user to actually use a 'search form' to search the directory for php files via key-words and display them in the same manner, how would I do that?
Thanks in advance for your help.
You will need to index the files (content/filename/keywords entered) into a database. You can use that database to lookup the various filenames for the search terms and then rank them.
I bet you can create search form..
So user posts search param:
$search = strtolower($_POST['search']);
Then in your foreach do:
if ( strpos($search, strtolower($filename)) === false ) {
continue;
}