Using php to get parent element of link with URL - php

I'm trying to implement a "find and replace" system for broken links. The problem is, for some links there are no replacements. So, I need to comment out certain li elements. You can see my code below to do this. (I'm starting with an HTML form).
<?php
$brokenlink = $_POST['brokenlink'];
$newlink = $_POST['newlink'];
$brokenlink = '"' . $brokenlink . '"';
$newlink = '"' . $newlink . '"';
$di = new RecursiveDirectoryIterator('hugedirectory');
foreach (new RecursiveIteratorIterator($di) as $filename => $file) {
// echo $filename . ' - ' . $file->getSize() . ' bytes <br/>';
$filetoedit = file_get_contents($file);
if(strpos($filetoedit, $brokenlink)) {
echo $brokenlink . "found in " . $filename . "<br/>";
$filetoedit = str_replace($brokenlink, $newlink, $filetoedit);
file_put_contents($filename, $filetoedit);
}
}
?>
What I want to accomplish is this: If I have a URL, I want to be able to find its li parent. For instance, I want PHP to be able to comment out the code below if the user inputs http://www.espn.com in an HTML form, I want php to find this element on my server:
<li>Sports</li>
And replace it with this:
<!-- <li>Sports</li> -->
Is this possible? Thanks.

I would try using this to parse the DOM.
http://simplehtmldom.sourceforge.net/
You can set a class to all the ones you want comment out. Then use this tool to find those classes and comment them all out at once.

Why not use a regexp to find and replace links, it would also take care of the perhaps expensive looping over links.
Here's a regex for matching urls
http://daringfireball.net/2010/07/improved_regex_for_matching_urls
then preg_replace the broken with the new, or the broken with the commented out version of the broken link
Alternatively you can just run grep on the directory via shell_exec, that way you don't have to open / read and parse files yourself.
Also take a look at this match url pattern in php using regular expression

I suggest you construct DOMDocument with the file content and use XPath to search for the broken link node.
$dom = new DOMDocument();
#$dom->loadHTML($filetoedit);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//li/a[#href="' . $brokenlink . '"]');
for ($i = 0; $i < $nodes->length; $i++) {
$node = $nodes->item($i);
// Do whatever you want here
}

Related

Search text files and display results with PHP

I have a folder (blogfiles/posts) with various text files, numbered (1.txt, 2.txt, 3.txt...) and they each hold a post for a blog (I haven't learned SQL yet). I'm trying to make a search engine for it that will take a query from a text box (done with this part), then search the files for each word in the query, and return the results (possibly in order of the number of times the word occurs).
Each text file looks like this:
Title on Line 1
Date Posted on Line 2 (in Month Date, Year form)
Post body to search on lines 3 and up
I currently have this code:
<?php
$q = $_GET["q"];
$qArray = explode(" ", $q);
//preparing files
$post_directory = "blogfiles/posts/";
$files = scandir($post_directory, 1);
$post_count = (count($files)) - 2;
$files = array_pop($files); // there are 2 server files I want to ignore (#1)
$files = array_pop($files); // there are 2 server files I want to ignore (#2)
foreach ($files as $file) {
//getting title
$post_path = $post_directory . $file;
$post_filecontents = file($post_path);
$post_title = $post_filecontents[0];
echo "<tr><td>" . $post_title . "</td></tr>";
}
if ($post_count > 2) {
$postPlural = "s";
}
echo "<tr><td>" . $post_count . " post" . $postPlural . ".";
?>
I'll apologize now for the formatting, I was trying to separate it to troubleshoot.
Any help to get this working would be greatly appreciated.
There are many ways to search files.
use preg_match_all function to match pattern for each file.
use system() function to run external command like grep (only available under *nix).
use strpos function ( not recommended because of low performance and lack of support of pattern ).
If you will face a big traffic you'd better use pre-build indexes to accelerate the search. for example split the posts into tokens ( words ) and add position info along with the words, when user search the some words you can just split the words first and then look for the indexes. It's simpler to discribe this method than to implement it. You may need a existing full-text search engine like Apache Lucene.

Using Simple Dom Parser create more than one files

I am using php simple dom parser to get the table elements of and html page and then create file for each element.
This is my code:
<?php
include_once('simple_html_dom.php');
$html = file_get_html('test.html');
foreach($html->find('table[id=backgroundTable]') as $element);
$element = $html->save();
$html->save('result.html');
The problem I have at them moment is that it stores all the tables in this result.html file.
What I need is the export results to be result1.html , result2.html . How can I achieve this?
Thank you very much in advance
You may try something like this, so in each loop the $i will be increased:
$html = file_get_html('test.html');
$i = 1;
foreach($html->find('table#backgroundTable') as $element) {
str_get_html($element)->save('result' . $i . '.html');
$i++;
}
So the results will be saved in result1.html, result2.html and so on.

A reliable way to output multiple JavaScript files in one PHP script? (Avoiding unexpected token ILLEGAL)

I'm using PHP to create a JavaScript document. It does do two things:
Read a directory containing some HTML files that I use as templates and then output an object containing key: value pairs that represent the filename: content, which will end up similar to this:
var HTML = {
"blogpost.html": '<div>{post}</div>',
"comment.html" : '<div class="comment">{comment}</div>'
};
Which allows me to use HTML["template.html"] to append templated data that I receive from AJAX requests.
Read a directory containing JavaScript files and output the content of those straight into the document.
Locally it's working fine, but I've been getting this error when I try it once uploaded:
Uncaught SyntaxError: Unexpected token ILLEGAL
I've tried wrapping the output I get from each of the HTML and JS files in things like:
preg_replace('/\s{2,}/', '', $output);
addslashes($output);
mysql_real_escape_string($output);
And a combination of those, but still the same error.
How can I reliably output the HTML and JavaScript I'm trying to place in the output?
Here's the current entire PHP script I am using (which works locally but not online weirdly):
header("Content-type: application/x-javascript");
// Write HTML templates.
$dir = dir($_SERVER['DOCUMENT_ROOT'] . '/view/html/');
$files = array();
while($file = $dir->read())
{
if(strpos($file, ".html"))
{
$key = substr($file, 0, strpos($file, ".html"));
array_push($files, '"' . $key . '": \'' . compress(file_get_contents($dir->path . $file)) . "'");
}
}
echo 'var HTML = {' . implode(",", $files) . '};';
// Output other JavaScript files.
$js = array();
array_push($js, file_get_contents("plugin/jquery.js"));
array_push($js, file_get_contents("plugin/imagesloaded.js"));
array_push($js, file_get_contents("plugin/masonry.js"));
array_push($js, file_get_contents("base/master.js"));
array_push($js, file_get_contents("plugin/ga.js"));
echo implode("", $js);
// Compress a JavaScript file.
function compress($str)
{
return addslashes(preg_replace('/\s{2,}/', '', $str));
}
You can use json_encode() for any PHP -> JS conversion:
while ($file = $dir->read()) {
if(strpos($file, ".html")) {
$key = substr($file, 0, strpos($file, ".html"));
$files[$key] = compress(file_get_contents($dir->path . $file));
}
}
echo 'var HTML = ' . json_encode($files) .';';
That's a parser error, so the problem happens before your code is even run.
I recommend checking the PHP versions of the two runtimes you're using. It would be ideal to develop and test with the same runtime that you plan to deploy to.
This happened to me before as well.
I'm assuming you copied part of the code you posted on a website like Github, or maybe your editor has stuffed up.
Invisible characters have been known to lurk in such documents.
A fix to this error is type the line of code with the error, the line above it, and the line underneath it in a fully plain-text editor like Notepad (Windows) or TextEdit (Mac). After typing it in, use Ctrl-A or Cmd-A (select all), then copy it and replace the code in your normal code editor.
Should fix the error.
I've worked out how to solve the problem in my current situation.
Background:
My local machine is running PHP Version 5.3.5
My host is running PHP Version 5.2.17
The problem was occurring at the end of each of the loaded HTML documents, where there wasn't a space or tab on the last line of the document, e.g.
<div>
content
</div> <-- at the beginning of this line
Solution:
I changed the preg_replace() statement that was working with the output of each file so that it would also match newlines, which seems to have fixed it.
return preg_replace('/\s{2,}/', '', $str); // Old
return preg_replace('/\s{2,}|\n/', '', $str); // New

How to combine two PHP 'foreach' loops

I'm trying to parse a really simple HTML document with some xpath. There are a total of 20 images and 20 links. My only goal is to get each link applied to it's corresponding image.
My current code below is returning each image a bunch of times. So for example, that first image, which is currently showing 20 times, has a different link applied to it with each instance. So instance #1 of image #1, has link #1 applied to it, instance #2 of image #1 has link #2 applied to it, and so on.
What I want to do is include each image once and apply the corresponding link to it, so I have 20 images, with their corresponding links applied to them. I'm pretty sure I need to combine my two foreach functions, but I'm not quite sure how to do that. Any help would be awesome, thanks guys.
foreach ( $images = $xpath->query("//div[#class='image']//a//img") as $image )
{
foreach ( $links = $xpath->query("//div[#class='image']//a") as $link )
echo "<a href='" . $link->getAttribute( 'href' ) . "'><img src='" . $image->getAttribute( 'src' ) . "'</a>", "\n";
}
Expanding on Ignacio's idea...
First, query for all anchor elements containing images
$anchors = $xpath->query('//div[#class="image"]//a[img]');
Then, use the anchor as the context for the image search
foreach ($anchors as $anchor) {
$images = $anchor->getElementsByTagName('img');
$img = $images->item(0);
printf('<img src="%s">%s',
$anchor->getAttribute('href'),
$img->getAttribute('src'),
PHP_EOL);
}
Update
To me, this seems a much more appropriate job for an XSL transformation
OK so if I understand correctly, after doing the xpath queries you'd end up with two arrays, each with the same number of elements, and they're all matched, meaning $images[x] needs $links[x] for any value of x.
Something like this may work:
$images = $xpath->query("//div[#class='image']//a//img");
$links = $xpath->query("//div[#class='image']//a");
foreach ( $images as $index => $image )
{
echo "<a href='" . $links[$index]->getAttribute( 'href' ) . "'><img src='" . $images[$index]->getAttribute( 'src' ) . "'</a>", "\n";
}

Can't get the dom node value extracted

I have a code that links to another site, grabs that data, and returns the string to a variable.. i'm wondering why this isn't working however?
<?php
$file = $DOCUMENT_ROOT . "http://www.sc2brasd.net";
$doc = new DOMDocument();
#$doc->loadHTMLFile($file);
$elements = $doc->getElementsByTagName('h1');
for ($i=1; $i<=7; $i++)
{
echo trim($elements->item($i)->nodeValue);
}
?>
there are seven "h1" tags that i would like to grab but they won't return to echo out? an example of the string would be "Here is the test string i am trying to pull out"
This will not work because the path dont exists. It points to a file on your server.
$file = $DOCUMENT_ROOT . "http://www.sc2brasd.net";
I'n not sure if loadHTMLFile() can handle URLs at all. You may need to get the document with file() and load it with DOMDocument::loadHTML.

Categories