I have a code that links to another site, grabs that data, and returns the string to a variable.. i'm wondering why this isn't working however?
<?php
$file = $DOCUMENT_ROOT . "http://www.sc2brasd.net";
$doc = new DOMDocument();
#$doc->loadHTMLFile($file);
$elements = $doc->getElementsByTagName('h1');
for ($i=1; $i<=7; $i++)
{
echo trim($elements->item($i)->nodeValue);
}
?>
there are seven "h1" tags that i would like to grab but they won't return to echo out? an example of the string would be "Here is the test string i am trying to pull out"
This will not work because the path dont exists. It points to a file on your server.
$file = $DOCUMENT_ROOT . "http://www.sc2brasd.net";
I'n not sure if loadHTMLFile() can handle URLs at all. You may need to get the document with file() and load it with DOMDocument::loadHTML.
Related
I have the code for image scraping but what I am trying to fix here are a few things:
replace this $the_site = "url"; with my input type="text"
so instead of putting url on the code I want to put the url on my input.
I want to make multiple folder and links, instead of putting same code like 5 times on the page I want to to point each url to a directory.
My code is about downloading images from pages and save them to folder so I want to put all inside one php tags
here's my code
<?php
$the_site = "url";
$the_tag = "div"; #
$the_class = "slides";
$html = file_get_contents($the_site);
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//'.$the_tag.'[contains(#id,"'.$the_class.'")]/img') as $item) {
$img_src = $item->getAttribute('src');
//print $img_src."\n"; Ignore This
//copy($img_src,'C:\xampp\htdocs\grabIMG\download'); Ignore This
$img_name = end(explode("/",$img_src));
echo $img_name.' has downloaded<br />';
$img_content = file_get_contents($img_src);
$fp = fopen(" folder/".$img_name,"w");
fwrite($fp,$img_content);
fclose($fp);
}
?>
i been posting this code like 5 times in the page, each time opening new php tags but i get this error and excution won't be completed
Fatal error: Maximum execution time of 30 seconds exceeded in
C:\xampp\htdocs\grabIMG\index.php on line 106
I am pulling a list of blog pages from a .XML file and printing the 2 newest entries for a web page. I however have no idea how to sort the .XML files by pubDate or file_edited.
The code successfully retrieves the files and prints the two newest entries.
Here is the PHP code block that retrieves the files and prints them.
<?php
date_default_timezone_set('Europe/Helsinki');
/* XML Source URL:s */
$pages=("blog/data/other/pages.xml");
/* XML Doc Conversions */
$xmlDoc = new DOMDocument();
echo "<div class='blog_article_wrapper'>";
function myFunction($x){
// Run 2 times, skip first file and stop loop.
for ($i=1; $i<=2; $i++) {
//Get "Title
$item_title=$x->item($i)->getElementsByTagName('title')
->item(0)->childNodes->item(0)->nodeValue;
//Get "Date" from .XML document.
$item_date=$x->item($i)->getElementsByTagName('pubDate')
->item(0)->childNodes->item(0)->nodeValue;
//Get "URL" from .XML document.
$item_url=$x->item($i)->getElementsByTagName('url')
->item(0)->childNodes->item(0)->nodeValue;
//Get "Author" from .XML document.
$item_author=$x->item($i)->getElementsByTagName('author')
->item(0)->childNodes->item(0)->nodeValue;
//Format date and author
$item_date = date('d.m.Y', strtotime($item_date));
$item_author = ucfirst(strtolower($item_author));
//Get content data from specifix .XML document being iterated in loop
$url=("blog/data/pages/" . $item_url . ".xml");
$xmlDoc = new DOMDocument();
$xmlDoc->load($url);
$y=$xmlDoc->getElementsByTagName('content')->item(0)->nodeValue;
//Limit content to 150 letters and first paragraph tag.
$start = strpos($y, '<p>="') + 9;
$length = strpos($y, '"</p>') - $start;
$src = substr($y, $start, $length);
$item_content = "\"" . (substr($src, 0, 150)) . "...\"";
// Page specific code for output comes here.
}
}
//Call loop and iterate data
$xmlDoc->load($pages);
$x=$xmlDoc->getElementsByTagName('item');
myFunction($x);
?>
Any advice, code or articles pointing in the right direction would be much appreciated.
Thank you!
I figured this out my self using another stackoverflow question and php.net
//Directory where files are stored.
$folder = "blog/data/pages/";
$array = array();
//scandir and populate array with filename as key and filemtime as value.
foreach (scandir($folder) as $node) {
$nodePath = $folder . DIRECTORY_SEPARATOR . $node;
if (is_dir($nodePath)) continue;
$array[$nodePath] = filemtime($nodePath);
}
//Sort entry and store two newest files into $newest
arsort($array);
$newest = array_slice($array, 0, 2);
// $newest is now populated with name of .XML document as key and filemtime as value
// Use built in functions array_keys() and array_values() to access data
?>
I can now modify the original code in the question to use only these two outputted files for retrieving the desired data.
I'm trying to implement a "find and replace" system for broken links. The problem is, for some links there are no replacements. So, I need to comment out certain li elements. You can see my code below to do this. (I'm starting with an HTML form).
<?php
$brokenlink = $_POST['brokenlink'];
$newlink = $_POST['newlink'];
$brokenlink = '"' . $brokenlink . '"';
$newlink = '"' . $newlink . '"';
$di = new RecursiveDirectoryIterator('hugedirectory');
foreach (new RecursiveIteratorIterator($di) as $filename => $file) {
// echo $filename . ' - ' . $file->getSize() . ' bytes <br/>';
$filetoedit = file_get_contents($file);
if(strpos($filetoedit, $brokenlink)) {
echo $brokenlink . "found in " . $filename . "<br/>";
$filetoedit = str_replace($brokenlink, $newlink, $filetoedit);
file_put_contents($filename, $filetoedit);
}
}
?>
What I want to accomplish is this: If I have a URL, I want to be able to find its li parent. For instance, I want PHP to be able to comment out the code below if the user inputs http://www.espn.com in an HTML form, I want php to find this element on my server:
<li>Sports</li>
And replace it with this:
<!-- <li>Sports</li> -->
Is this possible? Thanks.
I would try using this to parse the DOM.
http://simplehtmldom.sourceforge.net/
You can set a class to all the ones you want comment out. Then use this tool to find those classes and comment them all out at once.
Why not use a regexp to find and replace links, it would also take care of the perhaps expensive looping over links.
Here's a regex for matching urls
http://daringfireball.net/2010/07/improved_regex_for_matching_urls
then preg_replace the broken with the new, or the broken with the commented out version of the broken link
Alternatively you can just run grep on the directory via shell_exec, that way you don't have to open / read and parse files yourself.
Also take a look at this match url pattern in php using regular expression
I suggest you construct DOMDocument with the file content and use XPath to search for the broken link node.
$dom = new DOMDocument();
#$dom->loadHTML($filetoedit);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//li/a[#href="' . $brokenlink . '"]');
for ($i = 0; $i < $nodes->length; $i++) {
$node = $nodes->item($i);
// Do whatever you want here
}
I am not sure if this is even possible but I am trying to extract all the anchor tag links in a few HTML files on my website. I have currently written a php script that scans a few directories and sub directories that builds an array of HTML file links. Here is that code:
$di = new RecursiveDirectoryIterator('Migration');
$migrate = array();
foreach (new RecursiveIteratorIterator($di) as $filename => $file) {
if (eregi("\.html",$file) || eregi("\.htm",$file) ) {
$migrate[] .= $filename;
}
}
This method successfully produces the HTML File links that I need. Ex:
Migration/administration/billing/Billing.htm
Migration/administration/billing/_notes/Billing.htm.mno
Migration/administration/new business/_notes/New Business.htm.mno
Migration/administration/new business/New Business.htm
Migration/account/nycds/_notes/NYCDS Index.htm.mno
Migration/account/nycds/NYCDS Index.htm
There's more links but this gives you an idea. The next part is where I am stuck. I was thinking that I would need a for loop to loop through each array element, open the file, extract the links, then store those links somewhere. I am just not sure how I would go about this process. I tried to google this question but I never seemed to get results that matched what I was looking to do. Here is the simplified for loop that I have.
var obj = <?php echo json_encode($migrate); ?>;
for(var i=0;i< obj.length;i++){
// alert(obj[i]);
}
The above code is in javascript. From what I am reading, It seems that I shouldn't be using javascript but should maybe continue using PHP. I am confused on what my next steps should be. If someone can point me in the right direction I would really appreciate it. Thank you so much for your time.
Use DOMDocument::getElementsByTagName to retrieve all <a> tags
http://www.php.net/manual/en/domdocument.getelementsbytagname.php
Example,
$doc = new DOMDocument();
$doc->loadHTMLFile("filename.html");
$anchors = $doc->getElementsByTagName('a'); //retrieve all anchor tags
foreach ($anchors as $a) { //loop anchors
echo $a->nodeValue;
}
I'm retrieving files like so (from the Internet Archive):
<files>
<file name="Checkmate-theHumanTouch.gif" source="derivative">
<format>Animated GIF</format>
<original>Checkmate-theHumanTouch.mp4</original>
<md5>72ec7fcf240969921e58eabfb3b9d9df</md5>
<mtime>1274063536</mtime>
<size>377534</size>
<crc32>b2df3fc1</crc32>
<sha1>211a61068db844c44e79a9f71aa9f9d13ff68f1f</sha1>
</file>
<file name="CheckmateTheHumanTouch1961.thumbs/Checkmate-theHumanTouch_000001.jpg" source="derivative">
<format>Thumbnail</format>
<original>Checkmate-theHumanTouch.mp4</original>
<md5>6f6b3f8a779ff09f24ee4cd15d4bacd6</md5>
<mtime>1274063133</mtime>
<size>1169</size>
<crc32>657dc153</crc32>
<sha1>2242516f2dd9fe15c24b86d67f734e5236b05901</sha1>
</file>
</files>
They can have any number of <file>s, and I'm solely looking for the ones that are thumbnails. When I find them, I want to increase a counter. When I've gone through the whole file, I want to find the middle Thumbnail and return the name attribute.
Here's what I've got so far:
//pop previously retrieved XML file into a variable
$elem = new SimpleXMLElement($xml_file);
//establish variable
$i = 0;
// Look through each parent element in the file
foreach ($elem as $file) {
if ($file->format == "Thumbnail"){$i++;}
}
//find the middle thumbnail.
$chosenThumb = ceil(($i/2)-1);
//Gloriously announce the name of the chosen thumbnail.
echo($elem->file[$chosenThumb]['name']);`
The final echo doesn't work because it doesn't like have a variable choosing the XML element. It works fine when I hardcode it in. Can you guess that I'm new to handling XML files?
Edit:
Francis Avila's answer from below sorted me right out!:
$sxe = simplexml_load_file($url);
$thumbs = $sxe->xpath('/files/file[format="Thumbnail"]');
$n_thumbs = count($thumbs);
$middlethumb = $thumbs[(int) ($n_thumbs/2)];
$happy_string = (string)$middlethumb[name];
echo $happy_string;
Use XPath.
$sxe = simplexml_load_file($url);
$thumbs = $sxe->xpath('/files/file[format="Thumbnail"]');
$n_thumbs = count($thumbs);
$middlethumb = $thumbs[(int) ($n_thumbs/2)];
$middlethumbname = (string) $middlethumb['name'];
You can also accomplish this with a single XPath expression if you don't need the total count:
$thumbs = $sxe->xpath('/files/file[format="Thumbnail"][position() = floor(count(*) div 2)]/#name');
$middlethumbname = (count($thumbs)) ? $thumbs[0]['name'] : '';
A limitation of SimpleXML's xpath method is that it can only return nodes and not simple types. This is why you need to use $thumbs[0]['name']. If you use DOMXPath::evaluate(), you can do this instead:
$doc = new DOMDocument();
$doc->loadXMLFile($url);
$xp = new DOMXPath($doc);
$middlethumbname = $xp->evaluate('string(/files/file[format="Thumbnail"][position() = floor(count(*) div 2)]/#name)');
$elem->file[$chosenThumb] will give the $chosenThumb'th element from the main file[] not the filtered(for Thumbnail) file[], right?
foreach ($elem as $file) {
if ($file->format == "Thumbnail"){
$i++;
//add this item to a new array($filteredFiles)
}
}
$chosenThumb = ceil(($i/2)-1);
//echo($elem->file[$chosenThumb]['name']);
echo($filteredFiles[$chosenThumb]['name']);
Some problems:
Middle thumbnail is incorrectly calculated. You'll have to keep a separate array for those thumbs and get the middle one using count.
file might need to be {'file'}, I'm not sure how PHP sees this.
you don't have a default thumbnail
Code you should use is this one:
$files = new SimpleXMLElement($xml_file);
$thumbs = array();
foreach($files as $file)
if($file->format == "Thumbnail")
$thumbs[] = $file;
$chosenThumb = ceil((count($thumbs)/2)-1);
echo (count($thumbs)===0) ? 'default-thumbnail.png' : $thumbs[$chosenThumb]['name'];
/edit: but I recommend that guy's solution, to use XPath. Way easier.