I am using php simple dom parser to get the table elements of and html page and then create file for each element.
This is my code:
<?php
include_once('simple_html_dom.php');
$html = file_get_html('test.html');
foreach($html->find('table[id=backgroundTable]') as $element);
$element = $html->save();
$html->save('result.html');
The problem I have at them moment is that it stores all the tables in this result.html file.
What I need is the export results to be result1.html , result2.html . How can I achieve this?
Thank you very much in advance
You may try something like this, so in each loop the $i will be increased:
$html = file_get_html('test.html');
$i = 1;
foreach($html->find('table#backgroundTable') as $element) {
str_get_html($element)->save('result' . $i . '.html');
$i++;
}
So the results will be saved in result1.html, result2.html and so on.
Related
I'm using class to load html on one page, get HTML table content. The page is in windows-1250, so I'm using iconv to convert it to utf-8.
All this is done in one class, that I'm calling like this: $tableHtml = suplUpdater::getTableHtml(someParams,...);. When I echo that variable directly, everything looks nice. However, I want to parse the table rows with PHP DOMDocument to save them to database. Code looks like this:
$tableData = suplUpdater::getTableHtml(1400450400);
//echo($tableData);
$document = new DOMDocument();
$document->loadHTML($tableData);
$rows = $document->getElementsByTagName('tr');
$rows->item(0)->parentNode->removeChild($rows->item(0));//first row is just a header
$output = array();
foreach ($rows as $row) {
$currentOutput = array();
foreach ($row->childNodes as $cell) {
if ($cell->nodeType == 1) {
$currentOutput[] = $cell->nodeValue;
}
}
$output[] = $currentOutput;
}
When I do var_dump($output);, I get array, but it has messed up encoding. Where could be the problem? If needed, I can provide source table data.
EDIT:
When I copy table html to txt file, encoded in utf-8 and I do file_get_contents('tableHtml.txt'), I get the same result.
EDIT:
I have uploaded sample data here:http://anagmate.moxo.cz/data.txt
EDIT:
Screenshot of echo and var_dump is here:http://anagmate.moxo.cz/supl.png
I'm trying to implement a "find and replace" system for broken links. The problem is, for some links there are no replacements. So, I need to comment out certain li elements. You can see my code below to do this. (I'm starting with an HTML form).
<?php
$brokenlink = $_POST['brokenlink'];
$newlink = $_POST['newlink'];
$brokenlink = '"' . $brokenlink . '"';
$newlink = '"' . $newlink . '"';
$di = new RecursiveDirectoryIterator('hugedirectory');
foreach (new RecursiveIteratorIterator($di) as $filename => $file) {
// echo $filename . ' - ' . $file->getSize() . ' bytes <br/>';
$filetoedit = file_get_contents($file);
if(strpos($filetoedit, $brokenlink)) {
echo $brokenlink . "found in " . $filename . "<br/>";
$filetoedit = str_replace($brokenlink, $newlink, $filetoedit);
file_put_contents($filename, $filetoedit);
}
}
?>
What I want to accomplish is this: If I have a URL, I want to be able to find its li parent. For instance, I want PHP to be able to comment out the code below if the user inputs http://www.espn.com in an HTML form, I want php to find this element on my server:
<li>Sports</li>
And replace it with this:
<!-- <li>Sports</li> -->
Is this possible? Thanks.
I would try using this to parse the DOM.
http://simplehtmldom.sourceforge.net/
You can set a class to all the ones you want comment out. Then use this tool to find those classes and comment them all out at once.
Why not use a regexp to find and replace links, it would also take care of the perhaps expensive looping over links.
Here's a regex for matching urls
http://daringfireball.net/2010/07/improved_regex_for_matching_urls
then preg_replace the broken with the new, or the broken with the commented out version of the broken link
Alternatively you can just run grep on the directory via shell_exec, that way you don't have to open / read and parse files yourself.
Also take a look at this match url pattern in php using regular expression
I suggest you construct DOMDocument with the file content and use XPath to search for the broken link node.
$dom = new DOMDocument();
#$dom->loadHTML($filetoedit);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//li/a[#href="' . $brokenlink . '"]');
for ($i = 0; $i < $nodes->length; $i++) {
$node = $nodes->item($i);
// Do whatever you want here
}
I am not sure if this is even possible but I am trying to extract all the anchor tag links in a few HTML files on my website. I have currently written a php script that scans a few directories and sub directories that builds an array of HTML file links. Here is that code:
$di = new RecursiveDirectoryIterator('Migration');
$migrate = array();
foreach (new RecursiveIteratorIterator($di) as $filename => $file) {
if (eregi("\.html",$file) || eregi("\.htm",$file) ) {
$migrate[] .= $filename;
}
}
This method successfully produces the HTML File links that I need. Ex:
Migration/administration/billing/Billing.htm
Migration/administration/billing/_notes/Billing.htm.mno
Migration/administration/new business/_notes/New Business.htm.mno
Migration/administration/new business/New Business.htm
Migration/account/nycds/_notes/NYCDS Index.htm.mno
Migration/account/nycds/NYCDS Index.htm
There's more links but this gives you an idea. The next part is where I am stuck. I was thinking that I would need a for loop to loop through each array element, open the file, extract the links, then store those links somewhere. I am just not sure how I would go about this process. I tried to google this question but I never seemed to get results that matched what I was looking to do. Here is the simplified for loop that I have.
var obj = <?php echo json_encode($migrate); ?>;
for(var i=0;i< obj.length;i++){
// alert(obj[i]);
}
The above code is in javascript. From what I am reading, It seems that I shouldn't be using javascript but should maybe continue using PHP. I am confused on what my next steps should be. If someone can point me in the right direction I would really appreciate it. Thank you so much for your time.
Use DOMDocument::getElementsByTagName to retrieve all <a> tags
http://www.php.net/manual/en/domdocument.getelementsbytagname.php
Example,
$doc = new DOMDocument();
$doc->loadHTMLFile("filename.html");
$anchors = $doc->getElementsByTagName('a'); //retrieve all anchor tags
foreach ($anchors as $a) { //loop anchors
echo $a->nodeValue;
}
I'm retrieving files like so (from the Internet Archive):
<files>
<file name="Checkmate-theHumanTouch.gif" source="derivative">
<format>Animated GIF</format>
<original>Checkmate-theHumanTouch.mp4</original>
<md5>72ec7fcf240969921e58eabfb3b9d9df</md5>
<mtime>1274063536</mtime>
<size>377534</size>
<crc32>b2df3fc1</crc32>
<sha1>211a61068db844c44e79a9f71aa9f9d13ff68f1f</sha1>
</file>
<file name="CheckmateTheHumanTouch1961.thumbs/Checkmate-theHumanTouch_000001.jpg" source="derivative">
<format>Thumbnail</format>
<original>Checkmate-theHumanTouch.mp4</original>
<md5>6f6b3f8a779ff09f24ee4cd15d4bacd6</md5>
<mtime>1274063133</mtime>
<size>1169</size>
<crc32>657dc153</crc32>
<sha1>2242516f2dd9fe15c24b86d67f734e5236b05901</sha1>
</file>
</files>
They can have any number of <file>s, and I'm solely looking for the ones that are thumbnails. When I find them, I want to increase a counter. When I've gone through the whole file, I want to find the middle Thumbnail and return the name attribute.
Here's what I've got so far:
//pop previously retrieved XML file into a variable
$elem = new SimpleXMLElement($xml_file);
//establish variable
$i = 0;
// Look through each parent element in the file
foreach ($elem as $file) {
if ($file->format == "Thumbnail"){$i++;}
}
//find the middle thumbnail.
$chosenThumb = ceil(($i/2)-1);
//Gloriously announce the name of the chosen thumbnail.
echo($elem->file[$chosenThumb]['name']);`
The final echo doesn't work because it doesn't like have a variable choosing the XML element. It works fine when I hardcode it in. Can you guess that I'm new to handling XML files?
Edit:
Francis Avila's answer from below sorted me right out!:
$sxe = simplexml_load_file($url);
$thumbs = $sxe->xpath('/files/file[format="Thumbnail"]');
$n_thumbs = count($thumbs);
$middlethumb = $thumbs[(int) ($n_thumbs/2)];
$happy_string = (string)$middlethumb[name];
echo $happy_string;
Use XPath.
$sxe = simplexml_load_file($url);
$thumbs = $sxe->xpath('/files/file[format="Thumbnail"]');
$n_thumbs = count($thumbs);
$middlethumb = $thumbs[(int) ($n_thumbs/2)];
$middlethumbname = (string) $middlethumb['name'];
You can also accomplish this with a single XPath expression if you don't need the total count:
$thumbs = $sxe->xpath('/files/file[format="Thumbnail"][position() = floor(count(*) div 2)]/#name');
$middlethumbname = (count($thumbs)) ? $thumbs[0]['name'] : '';
A limitation of SimpleXML's xpath method is that it can only return nodes and not simple types. This is why you need to use $thumbs[0]['name']. If you use DOMXPath::evaluate(), you can do this instead:
$doc = new DOMDocument();
$doc->loadXMLFile($url);
$xp = new DOMXPath($doc);
$middlethumbname = $xp->evaluate('string(/files/file[format="Thumbnail"][position() = floor(count(*) div 2)]/#name)');
$elem->file[$chosenThumb] will give the $chosenThumb'th element from the main file[] not the filtered(for Thumbnail) file[], right?
foreach ($elem as $file) {
if ($file->format == "Thumbnail"){
$i++;
//add this item to a new array($filteredFiles)
}
}
$chosenThumb = ceil(($i/2)-1);
//echo($elem->file[$chosenThumb]['name']);
echo($filteredFiles[$chosenThumb]['name']);
Some problems:
Middle thumbnail is incorrectly calculated. You'll have to keep a separate array for those thumbs and get the middle one using count.
file might need to be {'file'}, I'm not sure how PHP sees this.
you don't have a default thumbnail
Code you should use is this one:
$files = new SimpleXMLElement($xml_file);
$thumbs = array();
foreach($files as $file)
if($file->format == "Thumbnail")
$thumbs[] = $file;
$chosenThumb = ceil((count($thumbs)/2)-1);
echo (count($thumbs)===0) ? 'default-thumbnail.png' : $thumbs[$chosenThumb]['name'];
/edit: but I recommend that guy's solution, to use XPath. Way easier.
I have a code that links to another site, grabs that data, and returns the string to a variable.. i'm wondering why this isn't working however?
<?php
$file = $DOCUMENT_ROOT . "http://www.sc2brasd.net";
$doc = new DOMDocument();
#$doc->loadHTMLFile($file);
$elements = $doc->getElementsByTagName('h1');
for ($i=1; $i<=7; $i++)
{
echo trim($elements->item($i)->nodeValue);
}
?>
there are seven "h1" tags that i would like to grab but they won't return to echo out? an example of the string would be "Here is the test string i am trying to pull out"
This will not work because the path dont exists. It points to a file on your server.
$file = $DOCUMENT_ROOT . "http://www.sc2brasd.net";
I'n not sure if loadHTMLFile() can handle URLs at all. You may need to get the document with file() and load it with DOMDocument::loadHTML.