PHP simplehtmldom find text in blank table structure - php

I'm having difficulty finding the DYNAMIC-TEXT value in a sea of HTML tables.
I have tried $html->find("th[plaintext*=Type") and from here, I wanted to access the sibling, but return nothing. Here's the table structure
<table>
<tbody>
</tbody>
<colgroup>
<col width="25%">
<col>
</colgroup>
<tbody>
<tr class="odd">
<th colspan="2">Name</th>
</tr>
<tr class="even">
<th width="30%">Type</th>
<td>DYNAMIC-TEXT</td>
</tr>
</tbody>
</table>
I expect the output to be the text of DYNAMIC-TEXT but the action output is nothing
Thanks

In your code $html->find("th[plaintext*=Type") you want to use an attribute selector *= but there is no attribute plaintext.
But there is an attribute width with the value 30%. You might use a pattern ^[0-9]+%$ to check for 1+ digits followed by a percentage sign.
If you find a result, you could get the next_sibling and get the plaintext from it.
For example:
$html = str_get_html($str);
foreach ($html->find("th[width*=^[0-9]+%$]") as $value) {
echo $value->next_sibling()->plaintext;
}
Result:
DYNAMIC-TEXT

Related

Why does this regex only mach the last occurence of the pattern

I'm trying to create a regex which will create html out of markup code.
When trying to replace a part of the [table] markup, it only replaces the last occurence.
I have the following regex (PHP):
/(\[table].*)\[\|](.*\[\/table])/s
Replace pattern:
$1</td><td>$2
And the following test string:
[table]<thead>
<th>head1</th><th>head2</th></thead>
[*]test1[|]test2
[*]test1[|]test2
[/table]
It should produce the following:
[table]<thead>
<th>head1</th><th>head2</th></thead>
[*]test1</td><td>test2
[*]test1</td><td>test2
[/table]
but it actualy procudes this:
[table]<thead>
<th>head1</th><th>head2</th></thead>
[*]test1[|]test2
[*]test1</td><td>test2
[/table]
The problem with that is, that [|] is used in other markup codes to but should not be replaced with </td><td>
To clarify:
I have a table "bb-code"
[table]
[**]header1[||]header2[||]header3[||]...[/**]
[*]child1.1[|]child1.2[|]child1.3[|]...
[*]child2.1[|]child2.2[|]child2.3[|]...
[*]child3.1[|]child3.2[|]child3.3[|]...
[*]...[|]...[|]...[|]...
[/table]
I want this to become this:
<table class="ui compact stripet yellow table">
<thead>
<tr>
<th>header1</th>
<th>header2</th>
<th>header3</th>
<th>....</th>
</tr>
</thead>
<tbody>
<tr>
<td>child1.1</td>
<td>child1.2</td>
<td>child1.3</td>
<td>...</td>
</tr>
<tr>
<td>child2.1</td>
<td>child2.2</td>
<td>child2.3</td>
<td>...</td>
</tr>
<tr>
<td>child3.1</td>
<td>child3.2</td>
<td>child3.3</td>
<td>...</td>
</tr>
</tbody>
</table>
Okay, I had a few minutes to spare on my mobile phone before bedtime, so I ran with Wiktor's comment and whacked up a series of preg_ functions to try to convert your bbcode to html. I don't have any experience with bbcode, so I am purely addressing your sample input and not considering fringe cases. I think php has a bbcode parser library somewhere, but I don't know if your bbcode syntax is the standard.
Some break down of the patterns implemented.
First, isolate each whole [table]...[/table] string in the document. (Regex101 Demo) ~\[table]\R*([^[]*(?:\[(?!/?table])[^[]*)*)\R*\[/table]~ will match the strings and pass the fullmatch as $m[0] and the substring between the table tags as $m[1] to BBTableToHTML().
Next, BBTableToHTML() will make 3 separate passes over the $m[1] string. Each of those patterns will send their respective matched strings to the associated custom function and return the modified string.
Before sending the updated $m[1] from BBTableToHTML() back to the echo, your desired <table...> and </table> tags will bookend $m[1].
Demos of the preg_replace_callback_array() patterns:
~\[\*\*]([^[]*(?:\[(?!/?\*\*])[^[]*)*)\[/\*\*]~ https://regex101.com/r/thINHQ/2
~(?:\[\*].*\R*)+~ https://regex101.com/r/thINHQ/3
~\[\*](.*)~ https://regex101.com/r/thINHQ/4
Code: (Demo)
$bbcode = <<<BBCODE
[b]Check out this demo[/b]
¯\_(ツ)_/¯
[table]
[**]header1[||]header2[||]header3[||]...[/**]
[*]child1.1[|]child1.2[|]child1.3[|]...
[*]child2.1[|]child2.2[|]child2.3[|]...
[*]child3.1[|]child3.2[|]child3.3[|]...
[*]...[|]...[|]...[|]...
[/table]
simple text
[table]
[**]a 1[||]and a 2[/**]
[*]A[|]B
[*]C[|]D
[/table]
[s]3, you're out[/s]
blah
BBCODE;
function BBTableToHTML($m) {
return "<table class=\"ui compact stripet yellow table\">\n" .
preg_replace_callback_array(
[
'~\[\*\*]([^[]*(?:\[(?!/?\*\*])[^[]*)*)\[/\*\*]~' => 'BBTHeadToHTML',
'~(?:\[\*].*\R*)+~' => 'BBTBodyToHTML',
'~\[\*](.*)~' => 'BBTBodyRowToHTML'
],
$m[1]
) .
"</table>";
}
function BBTHeadToHTML($m) {
return "\t<thead>\n" .
"\t\t<tr>\n\t\t\t<th>" . str_replace('[||]', "</th>\n\t\t\t<th>", $m[1]) . "</th>\n\t\t</tr>\n" .
"\t</thead>";
}
function BBTBodyToHTML($m) {
return "\t<tbody>\n{$m[0]}\t</tbody>\n";
}
function BBTBodyRowToHTML($m) {
return "\t\t<tr>\n\t\t\t<td>" . str_replace('[|]', "</td>\n\t\t\t<td>", $m[1]) . "</td>\n\t\t</tr>";
}
echo preg_replace_callback(
'~\[table]\R*([^[]*(?:\[(?!/?table])[^[]*)*)\R*\[/table]~',
'BBTableToHTML',
$bbcode
);
Output:
[b]Check out this demo[/b]
¯\_(ツ)_/¯
<table class="ui compact stripet yellow table">
<thead>
<tr>
<th>header1</th>
<th>header2</th>
<th>header3</th>
<th>...</th>
</tr>
</thead>
<tbody>
<tr>
<td>child1.1</td>
<td>child1.2</td>
<td>child1.3</td>
<td>...</td>
</tr>
<tr>
<td>child2.1</td>
<td>child2.2</td>
<td>child2.3</td>
<td>...</td>
</tr>
<tr>
<td>child3.1</td>
<td>child3.2</td>
<td>child3.3</td>
<td>...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
</tbody>
</table>
simple text
<table class="ui compact stripet yellow table">
<thead>
<tr>
<th>a 1</th>
<th>and a 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>A</td>
<td>B</td>
</tr>
<tr>
<td>C</td>
<td>D</td>
</tr>
</tbody>
</table>
[s]3, you're out[/s]
blah

XML node content change

I am having a xml content like
<?xml version="1.0"?>
<content
ID="immunSect"/>
<table
border="1"
width="100%">
<thead>
<tr>
<th>Vaccine</th>
<th>Date</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td><content
ID="immun2"/>Influenza virus vaccine</td>
<td>May 2012</td>
<td>Completed</td>
</tr>
<tr>
<td><content
ID="immun4"/>Tetanus and diphtheria toxoids</td>
<td>April 2012</td>
<td>Completed</td>
</tr>
</tbody>
</table>
</text>
My problem is I would like to change this node
<tbody>
<tr>
<td><content
ID="immun2"/>Influenza virus vaccine</td>
into
<tbody>
<tr>
<td><content
ID="immun2">Influenza virus vaccine</content</td>
Please help me how can I fetch that particular section and change the node structure from
<td><content id="xxx"/>test
into
<td><content id="xxx">test</content>
Basically when geting XML data, put loop until EOF in which get each line (#string1) and parse it, rewriting it to new variable (#string2), which you can save it to XML file/whatever you need with it. Start parsing #string1, moving everything to #string2, until you encounter and ignore it, instead put a flag to 1. Continue with copying, until you find then input into #string2 .
Of course it can be done in more elegant way, but I would need to know what are you using to get XML data.
Hope that helps.

Extract table data from HTML page in php

I have a html table with multiple rows and each row with multiple columns. A Sample for one row looks like this.
<table class ="classt">
<tbody>
<tr class="row">
<td height="20" valign="top" class="mosttext-new">data</td>
<td height="20" valign="top" class="mosttext-new"> data</td>
<td height="20" valign="top" class="mosttext-new">data</td>
</tr>
</tbody>
</table>
I am trying to extract all td elements like this in a php script.
foreach($html->find('table.classt') as $e){
foreach ($e->find('tr.row') as $tr){
foreach ($tr->find('td') as $td){
$text = $td->innertext;
}
}
}
But in $tr I am not getting row details with td tags. It is just coming the entire row withing double quotes like this
"data data data"
so my third loop is not able to find td as $tr does not have td tags.
Any idea on this?
I think you have to mention the class name after the 'td' followed by '.' like this
foreach ($tr->find('td.mosttext-new') as $td)
Hope this should solve your problem. All the best.

Simple HTML DOM: Notice->Trying to get property of non-object

I am getting an php notice when using simple html dom to scrape a website. There are 2 notices displayed and everything rendered underneath looks perfect when using the print_r function to display it.
The website table structure is as follows:
<table class=data schedTbl>
<thead>
<tr>
<th>DATA</th>
<th>DATA</th>
<th>DATA</th>
etc....
</tr>
</thead>
<tbody>
<tr>
<td>
<div class="class1">DATA</div>
<div class="class2">SAME DATA AS PREVIOUS DIV</div>
</td>
<td>DATA</td>
<td>DATA</td>
etc....
</tr>
<tr>
<td>
<div class="class1">DATA</div>
<div class="class2">SAME DATA AS PREVIOUS DIV</div>
</td>
<td>DATA</td>
<td>DATA</td>
etc....
</tr>
<tr>
<td>
<div class="class1">DATA</div>
<div class="class2">SAME DATA AS PREVIOUS DIV</div>
</td>
<td>DATA</td>
<td>DATA</td>
etc....
</tr>
etc....
</tbody>
</table>
The code below is used to find all tr in table[class=data schedTbl]. I have a tbody selector in there, but it seems to pay no attention to this selector as it still selects the tr in the thead.
include('simple_html_dom.php');
$articles = array();
getArticles('www.somesite.com');
function getArticles($page) {
global $articles;
$html = new simple_html_dom();
$html->load_file($page);
$items = $html->find('table[class=data schedTbl] tbody tr');
foreach($items as $post) {
$articles[] = array($post->children(0)->first_child(0)->plaintext,//0 -- GAME DATE
$post->children(1)->plaintext,//1 -- AWAY TEAM
$post->children(2)->plaintext);//2 -- HOME TEAM
}
}
So, I believe notices come from the tr in the thead because I am calling on the first child of the first td which only has one record. The reason for two is there is actually two tables with the same data structure in the body.
Again, I believe there are 2 ways of solving this:
1) PROBABLY THE EASIEST (fix the find selector so the TBODY works and only selects the tds within the tbodies)
2) Figure out a way to not do the first_child filter when it is not needed?
Please let me know if you would like a snapshot of the print_r($articles) output I am receiving.
Thanks in advance for any help provided!
Sincerely,
Bill C.
Just comment out line #695 in the simple_html_dom.php
if ($m[1]==='tbody') continue;
Then it should read the tbody.

Displaying the text in the multiple lines when retrieving from database

Hi
I have a table in which my row contains the text which i retrieve from the database.But i have a small width of row and the data i retrieve is large.And the text exceeds the width of my row so i want to break the data i retrieve into multi lines inside the table row.How can i do it.
My code is here:
$list = $mfidao1->fetchMfi($_GET['id']);
//print_r($list);
//die;
if(!empty($list))
{
foreach($list as $menu)
{
?>
<tr style="border:none; background-color:#FBFBFB;" >
<td class="topv">Social Mission</td>
<td class="topm" ><div class="txt"><?php echo $menu->mfi_1_a;?></div></td>
</tr>
<tr bgcolor="#E8E8E8">
<td class="topv">Address</td>
<td class="topm"><?php echo $menu->mfi_ii_c;?></td>
</tr>
<tr bgcolor="#FBFBFB">
<td class="topv">Phone</td>
<td class="topm"><?php echo $menu->mfi_ii_e;?></td>
</tr>
<tr bgcolor="#E8E8E8">
<td class="topv">Email</td>
<td class="topm"><?php echo $menu->mfi_ii_d;?></td>
</tr>
<tr bgcolor="#FBFBFB">
<td class="topv">Year Established</td>
<td class="topm"><?php echo $menu->mfi_i_c;?></td>
</tr>
<tr bgcolor="#E8E8E8">
<td class="topv">Current Legal Status</td>
<td class="topm"><?php echo $menu->mfi_i_d;?></td>
</tr>
<tr bgcolor="#FBFBFB">
<td class="topv">Respondent</td>
<td class="topm"><?php echo $menu->mfi_ii_a;?></td>
</tr>
<?php
}
}
?>
</table>
Set width of <td>. I think this is the best way to do this rather than word_wrap().
In your css for the table, use "table-layout:fixed" - This fixes the td elements width according to the way you want.
" word-wrap: break-word; " - this breaks the text in it so that it doesnt go beyond the boundary of the box.
You need to wrap the text in your td tags. Here is a link to a similar question
You could use the function wordwrap().
It wraps a string to a given number of characters using a string break character.
you can either use the php function
php wordwrap
or styling the td with css so that it uses the word-wrap attribute
css wordwrap
Not sure if this is what you want, but sound like you could use chunk_split()

Categories