Delete a table and its content from html code in php - php

I have a html code like below. I want to remove entire table and its content from it using php. I can remove the table tags using PHP strip_tags but i am not sure about deleting the table content. Any help would be appreciated.
<div>
<p> This is test paragraph</p>
<table>
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Age</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>50</td>
</tr>
</table>
</div>
Desired Output is
<div>
<p> This is test paragraph</p>
</div>
Thanks #medigeek and all for your answers, i've made few changes to the code so that it works with inline styles.
Solution:
$html = '<div>
<p> This is test paragraph</p>
<table style="width:100%"> // Note: Inline Styles
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Age</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>50</td>
</tr>
</table>
</div>';
$regex = '/<table[^>]*>.*?<\/table>/s'; // Regular expression pattern
//This Regex pattern even works with tags that contains inline styles
$replace = '';
$result = preg_replace($regex, $replace, $html);
echo($result);

<?php
$teststring = '<div>
<p> This is test paragraph</p>
<table>
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Age</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>50</td>
</tr>
</table>
</div>';
$regexpattern = '/<table>.*?<\/table>/s'; // Matching regular expression pattern
$replacement = ''; // Substitute the matched pattern with an empty string
$res = preg_replace($regexpattern, $replacement, $teststring);
echo($res);
?>
Matching regular expressions pattern
/=start regex pattern
<table> = start matching when you see this text
.* = match anything (any characters or empty) in between
? = but don't be greedy (as in only match
characters between the limits set)
<\/table> = stop matching when you see this text
/ = end regex pattern
s = modifier, keep matching even if you stumble upon new line characters
Regular expressions are powerful in matching otherwise seemingly complicated text strings in different programming languages. You may find more information here:
http://php.net/manual/en/function.preg-replace.php
http://php.net/manual/en/book.pcre.php
https://www.regular-expressions.info/quickstart.html

You can do this with preg_replace:
$your_html = '<table......';
$new_html = preg_replace("/(<table>).*?(<\/table>)/s", "", $your_html);
echo $new_html;
/*
OUTPUT:
<div>
<p> This is test paragraph</p>
</div>
*/
Regards,

If you are wanting to do this after the page has already loaded (page loads and certain event triggers the removal) it can't be done with php. Something like this would have to be done using javascript. If you are wanting it to be removed while the page is loading you're going to have to set the output to be within php.
<div>
<p> This is test paragraph</p>
<?php
if(*CASE FOR LOADING*){
echo "
<table>
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Age</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>50</td>
</tr>
</table>
";
}
?>
</div>
Doing it like this will only display the table when the cause is given through PHP.

Try this:
$content = <<<DATA
<div>
<p> This is test paragraph</p>
<table>
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Age</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>50</td>
</tr>
</table>
</div>
DATA;
$doc = new DOMDocument();
$doc->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$tables = $doc->getElementsByTagName('table');
while ($tables->length)
{
$tables[0]->parentNode->removeChild($tables[0]);
}
echo $doc->saveHTML();
Output:
<div> <p> This is test paragraph</p> </div>

Related

PHP function that removes anything else but the specified part of string

I have a string, which consists of any html elements.
For example, I have this string:
$htmlString = '<p>Test</p>
<h2>Test2</h2>
<table>
<thead>
<tr>
<td>Header 1</td>
<td>Header 2</td>
</tr>
</thead>
<tbody>
<tr>
<td>Col 1</td>
<td>Col 2</td>
</tr>
</tbody>
</table>
<span>Test span </span>
';
As you can see, the string consists of <p>, <h2>, <table>, <span> tags, and it could also contain other html tags.
My question is, is there a way so that I can make the string remove all the other elements except the <table>, rest assured that there are no other tags other than thead, tr, td, tbody inside the table element?
This will probably be closed as a duplicate, but before that happens here’s some quick code to help you with your specific HTML. Instead of “removing” everything except your target text, we are “extracting” our target text. The code itself is pretty straightforward so I didn’t see a need to comment things as much as I usually do.
<?php
$htmlString = '<p>Test</p>
<h2>Test2</h2>
<table>
<thead>
<tr>
<td>Header 1</td>
<td>Header 2</td>
</tr>
</thead>
<tbody>
<tr>
<td>Col 1</td>
<td>Col 2</td>
</tr>
</tbody>
</table>
<span>Test span </span>
';
$dom = new DOMDocument();
$dom->loadHTML($htmlString, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$dom->preserveWhiteSpace = true;
$tables = $dom->getElementsByTagName('table');
foreach($tables as $table) {
var_dump($dom->saveHTML($table));
}
Demo here: https://3v4l.org/YjkdT
May this be the solution you are searching:
https://www.php.net/manual/en/function.strip-tags.php
<?php
$striped = strip_tags($htmlString, '<table>');
?>

Why does this regex only mach the last occurence of the pattern

I'm trying to create a regex which will create html out of markup code.
When trying to replace a part of the [table] markup, it only replaces the last occurence.
I have the following regex (PHP):
/(\[table].*)\[\|](.*\[\/table])/s
Replace pattern:
$1</td><td>$2
And the following test string:
[table]<thead>
<th>head1</th><th>head2</th></thead>
[*]test1[|]test2
[*]test1[|]test2
[/table]
It should produce the following:
[table]<thead>
<th>head1</th><th>head2</th></thead>
[*]test1</td><td>test2
[*]test1</td><td>test2
[/table]
but it actualy procudes this:
[table]<thead>
<th>head1</th><th>head2</th></thead>
[*]test1[|]test2
[*]test1</td><td>test2
[/table]
The problem with that is, that [|] is used in other markup codes to but should not be replaced with </td><td>
To clarify:
I have a table "bb-code"
[table]
[**]header1[||]header2[||]header3[||]...[/**]
[*]child1.1[|]child1.2[|]child1.3[|]...
[*]child2.1[|]child2.2[|]child2.3[|]...
[*]child3.1[|]child3.2[|]child3.3[|]...
[*]...[|]...[|]...[|]...
[/table]
I want this to become this:
<table class="ui compact stripet yellow table">
<thead>
<tr>
<th>header1</th>
<th>header2</th>
<th>header3</th>
<th>....</th>
</tr>
</thead>
<tbody>
<tr>
<td>child1.1</td>
<td>child1.2</td>
<td>child1.3</td>
<td>...</td>
</tr>
<tr>
<td>child2.1</td>
<td>child2.2</td>
<td>child2.3</td>
<td>...</td>
</tr>
<tr>
<td>child3.1</td>
<td>child3.2</td>
<td>child3.3</td>
<td>...</td>
</tr>
</tbody>
</table>
Okay, I had a few minutes to spare on my mobile phone before bedtime, so I ran with Wiktor's comment and whacked up a series of preg_ functions to try to convert your bbcode to html. I don't have any experience with bbcode, so I am purely addressing your sample input and not considering fringe cases. I think php has a bbcode parser library somewhere, but I don't know if your bbcode syntax is the standard.
Some break down of the patterns implemented.
First, isolate each whole [table]...[/table] string in the document. (Regex101 Demo) ~\[table]\R*([^[]*(?:\[(?!/?table])[^[]*)*)\R*\[/table]~ will match the strings and pass the fullmatch as $m[0] and the substring between the table tags as $m[1] to BBTableToHTML().
Next, BBTableToHTML() will make 3 separate passes over the $m[1] string. Each of those patterns will send their respective matched strings to the associated custom function and return the modified string.
Before sending the updated $m[1] from BBTableToHTML() back to the echo, your desired <table...> and </table> tags will bookend $m[1].
Demos of the preg_replace_callback_array() patterns:
~\[\*\*]([^[]*(?:\[(?!/?\*\*])[^[]*)*)\[/\*\*]~ https://regex101.com/r/thINHQ/2
~(?:\[\*].*\R*)+~ https://regex101.com/r/thINHQ/3
~\[\*](.*)~ https://regex101.com/r/thINHQ/4
Code: (Demo)
$bbcode = <<<BBCODE
[b]Check out this demo[/b]
¯\_(ツ)_/¯
[table]
[**]header1[||]header2[||]header3[||]...[/**]
[*]child1.1[|]child1.2[|]child1.3[|]...
[*]child2.1[|]child2.2[|]child2.3[|]...
[*]child3.1[|]child3.2[|]child3.3[|]...
[*]...[|]...[|]...[|]...
[/table]
simple text
[table]
[**]a 1[||]and a 2[/**]
[*]A[|]B
[*]C[|]D
[/table]
[s]3, you're out[/s]
blah
BBCODE;
function BBTableToHTML($m) {
return "<table class=\"ui compact stripet yellow table\">\n" .
preg_replace_callback_array(
[
'~\[\*\*]([^[]*(?:\[(?!/?\*\*])[^[]*)*)\[/\*\*]~' => 'BBTHeadToHTML',
'~(?:\[\*].*\R*)+~' => 'BBTBodyToHTML',
'~\[\*](.*)~' => 'BBTBodyRowToHTML'
],
$m[1]
) .
"</table>";
}
function BBTHeadToHTML($m) {
return "\t<thead>\n" .
"\t\t<tr>\n\t\t\t<th>" . str_replace('[||]', "</th>\n\t\t\t<th>", $m[1]) . "</th>\n\t\t</tr>\n" .
"\t</thead>";
}
function BBTBodyToHTML($m) {
return "\t<tbody>\n{$m[0]}\t</tbody>\n";
}
function BBTBodyRowToHTML($m) {
return "\t\t<tr>\n\t\t\t<td>" . str_replace('[|]', "</td>\n\t\t\t<td>", $m[1]) . "</td>\n\t\t</tr>";
}
echo preg_replace_callback(
'~\[table]\R*([^[]*(?:\[(?!/?table])[^[]*)*)\R*\[/table]~',
'BBTableToHTML',
$bbcode
);
Output:
[b]Check out this demo[/b]
¯\_(ツ)_/¯
<table class="ui compact stripet yellow table">
<thead>
<tr>
<th>header1</th>
<th>header2</th>
<th>header3</th>
<th>...</th>
</tr>
</thead>
<tbody>
<tr>
<td>child1.1</td>
<td>child1.2</td>
<td>child1.3</td>
<td>...</td>
</tr>
<tr>
<td>child2.1</td>
<td>child2.2</td>
<td>child2.3</td>
<td>...</td>
</tr>
<tr>
<td>child3.1</td>
<td>child3.2</td>
<td>child3.3</td>
<td>...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
</tbody>
</table>
simple text
<table class="ui compact stripet yellow table">
<thead>
<tr>
<th>a 1</th>
<th>and a 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>A</td>
<td>B</td>
</tr>
<tr>
<td>C</td>
<td>D</td>
</tr>
</tbody>
</table>
[s]3, you're out[/s]
blah

Exclude table unit from the html string if there is selected phrase found

I will try to explain the issue with an example.
Let's say I have big html string which includes following types of table units.
<table id="table-1">
<tbody>
<tr><td><p>{{Phrase 1}}</p></td></tr>
</tbody>
</table>
<table id="table-2">
<tbody>
<tr><td><p>Sample text 1 goes here..</p></td></tr>
</tbody>
</table>
<table id="table-3">
<tbody>
<tr><td><p>{{Phrase 2}}</p></td></tr>
</tbody>
</table>
<table id="table-4">
<tbody>
<tr><td><p>Sample text 2 goes here..</p></td></tr>
</tbody>
</table>
I need PHP function to exclude the complete table from the html string if it contains {{Phrase 1}} or {{Phrase 2}}.
Simply in above example I need to exclude table-1 & table-3 and the result string would be like bellow,
<table id="table-2">
<tbody>
<tr><td><p>Sample text 1 goes here..</p></td></tr>
</tbody>
</table>
<table id="table-4">
<tbody>
<tr><td><p>Sample text 2 goes here..</p></td></tr>
</tbody>
</table>
I tried the preg_replace function but it didn't work as I can just replace the selected text not whole unit.
Can anyone here help me to overcome this issue.
Sample code which I had so far and still trying to develop it.
$patterns = array();
$patterns[0] = '{{Phrase 1}}';
$patterns[1] = '{{Phrase 2}}';
$replacements = array();
$replacements[0] = '';
$replacements[1] = '';
$string = '<table id="table-1">
<tbody>
<tr><td><p>{{Phrase 1}}</p></td></tr>
</tbody>
</table>
<table id="table-2">
<tbody>
<tr><td><p>Sample text 1 goes here..</p></td></tr>
</tbody>
</table>
<table id="table-3">
<tbody>
<tr><td><p>{{Phrase 2}}</p></td></tr>
</tbody>
</table>
<table id="table-4">
<tbody>
<tr><td><p>Sample text 2 goes here..</p></td></tr>
</tbody>
</table>';
echo '<pre>';
echo htmlspecialchars(preg_replace($patterns, $replacements, $string));
echo '</pre>';
If the structure is always the same, then you can do it in a simple regex:
// This regex matches the current structure, no matter what the number for the table id is
// and either Phrase 1 or 2.
$regex = '/(<table id="table-[0-9]+">[\s]+<tbody>[\s]+<tr><td><p>\{\{Phrase (1|2)\}\}<\/p><\/td><\/tr>[\s]+<\/tbody>[\s]+<\/table>)/';
$html = '<table id="table-1">
<tbody>
<tr><td><p>{{Phrase 1}}</p></td></tr>
</tbody>
</table>
<table id="table-2">
<tbody>
<tr><td><p>Sample text 1 goes here..</p></td></tr>
</tbody>
</table>
<table id="table-3">
<tbody>
<tr><td><p>{{Phrase 2}}</p></td></tr>
</tbody>
</table>
<table id="table-4">
<tbody>
<tr><td><p>Sample text 2 goes here..</p></td></tr>
</tbody>
</table>';
// Simply perform a replace with an empty string
$clean = preg_replace($regex, '', $html);
Demo: https://3v4l.org/4QHvm
If you want a more detailed explanation about the regex, you can read more here: https://regex101.com/r/B128DE/1
One very simple way to do it without having to use DOM or (God forbid) regex is to strip tags and explode on three new lines.
Strip tags will remove all HTML and leave blank spaces in it's place.
$html = '<table id="table-1">
<tbody>
<tr><td><p>{{Phrase 1}}</p></td></tr>
</tbody>
</table>
<table id="table-2">
<tbody>
<tr><td><p>Sample text 1 goes here..</p></td></tr>
</tbody>
</table>
<table id="table-3">
<tbody>
<tr><td><p>{{Phrase 2}}</p></td></tr>
</tbody>
</table>
<table id="table-4">
<tbody>
<tr><td><p>Sample text 2 goes here..</p></td></tr>
</tbody>
</table>';
$arr = explode(PHP_EOL.PHP_EOL.PHP_EOL , strip_tags($html));
// Optional output. But the trim is needed so some
// kind of loop is needed to remove the extra spaces
For($i=1; $i<count($arr);){
Echo trim($arr[$i]) . "<Br>\n";
$i = $i+2;
}
https://3v4l.org/gPQZn

PHP Table Reader

How to read and get the ISP value from html table?
<table style="padding-top:10px;">
<tbody>
<tr>
<th>ISP:</th>
<td>My Provider</td>
</tr>
<tr><th>Organization:</th><td nowrap=""></td>
</tr>
<tr><th>Connection:</th>
</tbody></table>
Given you lack of information, a regular expression would be the easiest solution.
$matches = array();
preg_match("<th>ISP:</th>[\r\n\s\t]*<td>(.*)</td>", "<th>ISP:</th><td>My Provider</td>...", $matches);
var_dump($matches);

Extract Value from HTML using PHP

I'm retrieving a HTML page using cURL. The html page has a table like this.
<table class="table2" style="width:85%; text-align:center">
<tr>
<th>Refference ID</th>
<th>Transaction No</th>
<th>Type</th>
<th>Operator</th>
<th>Amount</th>
<th>Slot</th>
</tr>
<tr>
<td>130717919020ffqClE0nRaspoB</td>
<td>8801458920369</td>
<td>Purchase</td>
<td>Visa</td>
<td>50</td>
<td>20130717091902413</td>
</tr>
</table>
This is the only table in that HTML page. I need to extract Refference ID & Slot using PHP.
But no idea how that can be done.
EDIT:
This one helped me a lot.
A regex based solution like the accepted answer is not the right way to extract information from HTML documents.
Use a DOMDocument based solution like this instead:
$str = '<table class="table2" style="width:85%; text-align:center">
<tr>
<th>Refference ID</th>
...
<th>Slot</th>
</tr>
<tr>
<td>130717919020ffqClE0nRaspoB</td>
...
<td>20130717091902413</td>
</tr>
</table>';
// Create a document out of the string. Initialize XPath
$doc = new DOMDocument();
$doc->loadHTML($str);
$selector = new DOMXPath($doc);
// Query the values in a stable and easy to maintain way using XPath
$refResult = $selector->query('//table[#class="table2"]/tr[2]/td[1]');
$slotResult = $selector->query('//table[#class="table2"]/tr[2]/td[6]');
// Check if the data was found
if($refResult->length !== 1 || $slotResult->length !== 1) {
die("Data is corrupted");
}
// XPath->query always returns a node set, even if
// this contains only a single value.
$refId = $refResult->item(0)->nodeValue;
$slot = $slotResult->item(0)->nodeValue;
echo "RefId: $refId, Slot: $slot", PHP_EOL;
$str = '<table class="table2" style="width:85%; text-align:center">
<tr>
<th>Refference ID</th>
<th>Transaction No</th>
<th>Type</th>
<th>Operator</th>
<th>Amount</th>
<th>Slot</th>
</tr>
<tr>
<td>130717919020ffqClE0nRaspoB</td>
<td>8801458920369</td>
<td>Purchase</td>
<td>Visa</td>
<td>50</td>
<td>20130717091902413</td>
</tr>
</table>';
preg_match_all('/<td>([^<]*)<\/td>/', $str, $m);
$reference_id = $m[1][0];
$slot = $m[1][5];

Categories