Goal
What I'm aiming to achieve is to create a small web app where the user can edit certain elements such as img src, href, etc. Then save it to a file.
I've created the basic code which allows the user to edit however I'm struggling to take the amended code and save it to a .html file.
Progress
<?php
function getHTMLByID($id, $html) {
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$node = $dom->getElementById($id);
if ($node) {
return $dom->saveXML($node);
}
return FALSE;
}
$html = file_get_contents('http://www.mysql.com/');
$codeString = getHTMLByID('l1-nav-container', $html);
echo $codeString;
$myfile = fopen("newfile.html", "w") or die("Unable to open file!");
fwrite($myfile, $codeString);
fclose($myfile);
?>
I've created some basic code which saves a string to a file which is being received from another website however, I can't seem to work out how to get the code from the page.
Related
I'm parsing an html file, and getting contents of a pre tag then saving it to a text file.
however when i open the text file in sublime, or other text editors the formmating is gone,
My question: how can i save the text in its original state inside the txt file.
the contents of the pre are below this:
x4 x4
|---------------------|-|-------------------|--------------------|
|---------------------|-|-------------------|--------------------|
|----------2-0-0------|-|-------------------|--------------------|
|----------------1-0-0|-|-------------------|--------------------|
|3-0-1-3-0------------|0|1-3-1-3-1-3-1-0----|1-3-1-3-1-3-1-0---0-|
x4 x4
|------------------------|-------------|-------------------|
|------------------------|-------------|-------------------|
|------------------------|-------------|-------------------|
|------------------------|-------------|0--0033------------|
|1-3-1-3-1-3-1-0--0000--0|1-3-1-3-1-3-1|--------333~-335-0-|
x4 x4
|------------------------|---------------------|-|-------------|
|------------------------|---------------------|-|-------------|
|------------------------|----------2-0-0------|-|-------------|
|------------------------|----------------1-0-0|-|-------------|
|0--0000--0-1-3-1-3-1-3-1|3-0-1-3-0------------|0|1-3-1-3-1-3-1|
my code:
<?php
// example of how to use basic selector to retrieve HTML contents
include('simple_html_dom.php');
// get DOM from URL or file
$html = file_get_html('http://metaltabs.com/tab/10464/index.html');
foreach($html->find('title') as $e)
echo $e->innertext . '<br>';
$my_file = fopen("textfile.txt", "w") or die("Unable to open file!");
foreach($html->find('pre') as $e)
echo nl2br($e->innertext) . '<br>';
$txt = $e->innertext;
fwrite($my_file, $txt);
fclose($my_file);
?>
The problems with your parsing results are:
Line breaks are not preserved;
HTML entities are preserved.
To resolve line break issue you have to use ->load() instead of file_get_html:
$html = new simple_html_dom();
$data = file_get_contents( 'http://metaltabs.com/tab/10464/index.html' );
$html->load( $data , True, False );
/* └─┬┘ └─┬─┘
Optional parameter Optional parameter
lowercase Strip \r\n
*/
To resolve entities issue you can use php function ``:
$txt = html_entity_decode( $e->innertext );
The result is something like this:
Tuning E A D G B E
|------------------------------------------------------------|
|------------------------------------------------------------|
|------------------------------------------------------------|
|------------------------------------------------------------|
|-------<7-8>----------<10-11>---------<7-8>---7--10--8--11--|x9
|-0000-----------0000------------0000----------0-------------|
I tried this code and opening with sublime text, the text file preserve the same formatting as in your website:
$html = file_get_contents("http://metaltabs.com/tab/4086/index.html");
$dom = new domDocument('1.0', 'utf-8');
// load the html into the object
$dom->loadHTML($html);
//preserve white space
$dom->preserveWhiteSpace = true;
$pre= $dom->getElementsByTagName('pre');
$file = fopen('text.txt', 'w');
fwrite($file, $pre->item(0)->nodeValue);
fclose($file);
This is assuming that you are sure that there is only one pre tag in your page, otherwise you have to loop through the $pre variable
Hello I've got a bunch of divs I'm trying to scrape the content values from and I've managed to successfully pull out one of the values, result! However I've hit a brick wall, I want to now pull out the one after it inside the current code I've done. Hit a brick wall here would appreciate any help.
Here is the bit of code i'm currently using.
foreach ($arr as &$value) {
$file = $DOCUMENT_ROOT. $value;
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*[contains(#class, 'covGroupBoxContent')]//div[3]//div[2]");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
$maps = $node->nodeValue;
echo $maps;
}
}
}
}
I simply want them all to have separate outputs that I can echo out.
I recommend you use Simple HTML DOM. Beyond that I need to see a sample of the HTML you are scraping.
If you are scraping a website outside your domain I'd recommend saving the source HTML to a file for review and testing. Some websites combat scraping, thus what you see in the browser is not what your scraper would see.
Also, I'd recommend setting a random user agent via ini_set(). If you need a function for this I have one.
<?php
$html = file_get_html($url);
IF ($html) {
$myfile = fopen("testing.html", "w") or die("Unable to open file!");
fwrite($myfile, $html);
fclose($myfile);
}
?>
The below code successfully saves the child div but also saves some numbers in the file at the end. I think its the bytes of data present, how do i get rid of the numbers it saves?
$file = '../userfolders/'.$email.'/'.$ongrassdb.'/'.$pagenameselected.'.php';
$doc = new DOMDocument();
$doc->load($file);
$ele = $doc->createElement('div', $textcon);
$ele ->setAttribute('id', $divname);
$ele ->setAttribute('style', 'background: '.$divbgcolor.'; color :'.$divfontcolor.' ;display : table-cell;');
$element = $doc->getElementsByTagName('div')->item(0);
$element->appendChild($ele);
$doc->appendChild($element);
$myfile = fopen($file, "a+") or die('Unable to open file!');
$html = $doc->save($file);
fwrite($myfile,$html);
fclose($myfile);
I don't want to use saveHTML nor saveHTMLFile because it creates multiple instances of the divs and adds html tags to it.
$doc->load($file);
...
$myfile = fopen($file, "a+") or die('Unable to open file!');
$html = $doc->save($file);
fwrite($myfile,$html);
fclose($myfile);
The $doc->save() method saves the DOM tree to the file, and returns the number of bytes it wrote to the file. This number is stored in $html and is then append to the same file by fwrite().
Just remove the fopen(), fwrite() and fclose() calls.
I removed the last two lines and it solved the issue
fwrite($myfile,$html);
fclose($myfile);
We've 700 static HTML files in folder "music". We want to put analytic code in the end of the HTML files.. like before
</body> </html>
tags.
Please anybody let me know, how its possible with PHP code?
Too easy. Find all files, read content, modify content, save content.
<?php
// open directory
if($handle = opendir(__DIR__)) {
// search for
$search = '</body>';
// replace with (</body> gets appended later in the script)
$replace = <<< EOF
<!-- your analytics code here -->
EOF;
// loop through entries
while(false !== ($entry = readdir($handle))) {
if(is_dir($entry)) continue; // ignore entry if it's an directory
$content = file_get_contents($entry); // open file
$content = str_replace($search, $replace . '</body>', $content); // modify contents
file_put_contents($entry, $content); // save file
}
}
echo 'done';
?>
I am using server side for getting photos from external url. I am using simple php dom library for getting this as per SO suggestion. But I am lacking performance in this. I mean for some sites I am not able to get all the photos.
$url has the example external site which is not giving me all the images.
$url
="http://www.target.com/c/baby-baby-bath-bath-safety
/-/N-5xtji#?lnk=nav_t_spc_3_inc_1_1";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $imageUrl = $tag->getAttribute('src');
echo "<br />";
}
Is this possible I can have functionality/accuracy similar to the option of Firefox
Firefox-> tools -> page info -> media
I mean I just want to be more accurate for this as the existing library is not fetching all images. Also I tried file_get_content...which is also not fetching all the images.
You need to use regular expressions to get images' src. DOMDocument build all DOM structure in memory, You needn't it. When You get URLs, use file_get_contents() and write data to files. Also add max_execution_time if You'll parse many pages.
Download images from remote server
function save_image($sourcePath,$targetPath)
{
$in = fopen($sourcePath, "rb");
$out = fopen($targetPath, "wb");
while ($chunk = fread($in,8192))
{
fwrite($out, $chunk, 8192);
}
fclose($in);
fclose($out);
}
$src = "http://www.example.com/thumbs/thumbs-t2/1/ts_11083.jpg"; //image source
$target = dirname(__FILE__)."/images/pic.jpg"; //where to save image with new name
save_image($src,$target);