PHP domdocument manipulation issues - php

I have a question regarding the domdocument.
My $html contains something like
texts …paragraph..
<table class='test'>
tr and td...
</table>
texts and more texts
I want to detect if there my html variable has a table element. If so, wrap the other texts in <p> tag.
so it will be
<p>texts …paragraph..</p>
<table class='test'>
tr and td...
</table>
<p>texts and more texts</p>
My codes is like
$doc = new DOMDocument();
$doc->loadHTML($htmlString);
$tables = $doc->getElementsByTagName('table');
foreach ($tables as $table) {
//I am not sure what to do next...
}
Can someone help me out about this? Thanks so much!

I didnt test this but.
$html = preg_replace('/(<table.*</table>)/i','<p>$1</p>', $html);
Hope it helps...

Related

How to find a h3 tag with a certain value

Well, I have a HTML File with the following structure:
<h3>Heading 1</h3>
<table>
<!-- contains a <thead> and <tbody> which also cointain several columns/lines-->
</table>
<h3>Heading 2</h3>
<table>
<!-- contains a <thead> and <tbody> which also cointain several columns/lines-->
</table>
I want to get JUST the first table with all its content. So I'll load the HTML File
<?php
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML(file_get_contents('http://www.example.com'));
libxml_clear_errors();
?>
All tables have the same classes and also have NO specific ID's. That's why the only way I could think of was to grab the h3-tag with the value "Heading 1". I already found this one, which works well for me. (Thinking of the fact that other tables and captions could be added leaves the solution as unfavorable)
How could I grab the h3 tag WITH the value "Heading 1"? + How could I select the following table?
EDIT#1: I don't have access to the HTML File, so I can't edit it.
EDIT#2: My Solution (thanks to Martin Henriksen) for now is:
<?php
$doc = new DOMDocument(1.0);
libxml_use_internal_errors(true);
$doc->loadHTML(file_get_contents('http://example.com'));
libxml_clear_errors();
foreach($doc->getElementsByTagName('h3') as $element){
if($element->nodeValue == 'exampleString')
$table = $element->nextSibling->nextSibling;
$innerHTML= '';
$children = $table->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
echo $innerHTML;
file_put_contents("test.xml", $innerHTML);
}
?>
You can Find any tag in HTML using simple_html_dom.php class you can download this file from this link https://sourceforge.net/projects/simplehtmldom/?source=typ_redirect
Than
<?php
include_once('simple_html_dom.php');
$htm = "**YOUR HTML CODE**";
$html = str_get_html($htm);
$h3_tag = $html->find("<h3>",0)->innertext;
echo "HTML code in h3 tag";
print_r($h3_tag);
?>
You can fetch out all the DomElements which the tag h3, and check what value it holds by accessing the nodeValue. When you found the h3 tag, you can select the next element in the DomTree by nextSibling.
foreach($dom->getElementsByTagName('h3') as $element)
{
if($element->nodeValue == 'Heading 1')
$table = $element->nextSibling;
}

domdocument regex replace tag

hello I have a php regex code like this :
preg_replace('~<div\s*.*?(?:\s*class\s*=\s*"(.*?)"|id\s*=\s*"(.*?)\s*)?>~i','<div align="center" class="$1" id="$2">', "html source code");
now what I want to do is to replace all tags in the source html code and then keep only the class and id from the div tag plus add align="center" to it:
examples:
<div style="border:none;" class="classbutton"> will be replaced to <div align="center" class="classbutton">
<div style="border:none;" class="classbutton" id="idstyle"> will be replaced to <div align="center" class="classbutton" id="idstyle">
I already tried many codes using php regex but nothing seems to be working for me. so if someone can help me or give me a domdocument code to fix this issue.
thanks in advance.
Here is some snippet that should get you going:
$html = '<body><div style="border:none;" class="classbutton" id="idstyle">Some text</div></body>'; // Sample HTML string
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div[#class="classbutton"]'); // Get all DIV tags with class "classbutton"
foreach($divs as $div) { // Loop through all DIVs found
$div->setAttribute('align', 'center'); // Set align="center"
$div->removeAttribute('style'); // Remove "style" attribute
}
echo $dom->saveHTML(); // Save HTML (use $html = $dom->saveHTML();)
See IDEONE demo

How to fetch data (text) from an external website with PHP if possible?

I'm trying to extract data (text) from an external site and put it on my site.
I want to get football scores of an external site and put it on mine.
I've researched and found out I can do this using Preg_Match but i just can't seem to figure out how to extract data within html tags.
For example
this is the HTML structure of an external site.
<td valign="top" align="center" class="s1"><b>Text I Want To Fetch</b></td>
How would I fetch the text within tags? Would help me out allot! THANKS!
You can get the content of a webpage by using file_get_contents method.
Eg:
$content = file_get_contents('http://www.source.com/page.html');
Try this:
<?php
$html = '<td valign="top" align="center" class="s1"><b>Text I Want To Fetch</b></td>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$dom = $dom->getElementsByTagName('td'); //find td
$dom = $dom->item(0); //traverse the first td
$dom = $dom->getElementsByTagName('b'); //find b
$dom = $dom->item(0); //traverse the first b
$dom = $dom->textContent; //get text
var_dump($dom); //dump it, echo, or print
Output
In this example, there weren't any other textContent, so if your HTML only has text within bold, you may use this as well:
<?php
$html = '<td valign="top" align="center" class="s1"><b>Text I Want To Fetch</b></td>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$dom = $dom->textContent;
var_dump($dom);
Output
if you're talking about using php to fetch data, then file_get_contents(url) may help; however, you can fetch data using AJAX request with Jquery too. Down here is the link to AJAX documentation:
http://api.jquery.com/jquery.ajax/

I want php code to find href title and some other infos from html table

I create this code until now:
<?php
$url=" SOME HTML URL ";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
echo $tag->getAttribute('href');
}
?>
I have html pages with tables so i want the link the title and the date. Example of html code:
<TR>
<TD align="center" vAlign="top" bgColor="#ffffff" class="smalltext">3</TD>
<TD class="plaintext" >THIS IS THE TITLE </TD>
<TD align="center" class="plaintext" >THIS IS DATE</TD>
</TR>
It works fine for me for the link, but i don't know how to take the others.
Tnx.
Where you are doing this:
$tags = $doc->getElementsByTagName('a');
You are getting back all the A tags. There only happens to be one.
If you want to get the text "THIS IS DATE", you're aren't going to get it by looking in A tags because the text is not inside an A tag - it is in a TD tag.
$tds = $doc->getElementsByTagName('td');
... would work to get all the TD elements, or you could assign an ID to the element you want to target and use getElementById instead.
Basically, though, this information is all in the documentation, which you absolutely should read before asking questions. Happy reading!
Once again, that's: http://php.net/manual/en/class.domdocument.php

php simplehtmldom issue

I have a question on simplehtmldom.
how can I get a text of certain element that is a next_sibling of another element that contains a certain text?
for example:
I have html text as this:
<div>
<table>
<tr>
<td>prova</td>
<td>pippo</td>
</tr>
</table>
</div>
and I need to extract the text of second "td".Consider that I know that the value "prova" is a fixed value. I thought that i could use this code:
echo $html->find("td:contains('prova')",0)->next_sibling();
but "contains" doesn't exists in simplehtmldom.
How I can do that?
Thanks a lot
thanks for your answer but I need to extract text of td next to td that contains the text "prova".
As example I need to extract the value "pippo" with a similar code
echo $html->find("td:contains('prova')",0)->next_sibling()->innertext;
because I know the value of first column. Unfortunately the function contains doesn't exists in simplehtmldom.
The code
echo $html->find("td:innertext('prova')",0)->next_sibling();
doesn't is the right way.
Do you have other suggestion?
Thanks
try this code
<?php
include_once "simple_html_dom.php";
// the html code loaded (in this case in string mode)
$html = '<div>
<table>
<tr>
<td>prova</td>
<td>pippo</td>
</tr>
</table>
</div>';
$dom = str_get_html($html);
// the selector :contains isn't develop yet
$tds = $dom -> find("td");
foreach($tds as $td){
if ($td -> innertext == "prova"){
echo $td -> next_sibling() -> innertext;
}
}
?>

Categories