Why does this regex only mach the last occurence of the pattern

Why does this regex only mach the last occurence of the pattern - php

I'm trying to create a regex which will create html out of markup code.
When trying to replace a part of the [table] markup, it only replaces the last occurence.
I have the following regex (PHP):
/(\[table].*)\[\|](.*\[\/table])/s
Replace pattern:
$1</td><td>$2
And the following test string:
[table]<thead>
<th>head1</th><th>head2</th></thead>
[*]test1[|]test2
[*]test1[|]test2
[/table]
It should produce the following:
[table]<thead>
<th>head1</th><th>head2</th></thead>
[*]test1</td><td>test2
[*]test1</td><td>test2
[/table]
but it actualy procudes this:
[table]<thead>
<th>head1</th><th>head2</th></thead>
[*]test1[|]test2
[*]test1</td><td>test2
[/table]
The problem with that is, that [|] is used in other markup codes to but should not be replaced with </td><td>
To clarify:
I have a table "bb-code"
[table]
[**]header1[||]header2[||]header3[||]...[/**]
[*]child1.1[|]child1.2[|]child1.3[|]...
[*]child2.1[|]child2.2[|]child2.3[|]...
[*]child3.1[|]child3.2[|]child3.3[|]...
[*]...[|]...[|]...[|]...
[/table]
I want this to become this:
<table class="ui compact stripet yellow table">
<thead>
<tr>
<th>header1</th>
<th>header2</th>
<th>header3</th>
<th>....</th>
</tr>
</thead>
<tbody>
<tr>
<td>child1.1</td>
<td>child1.2</td>
<td>child1.3</td>
<td>...</td>
</tr>
<tr>
<td>child2.1</td>
<td>child2.2</td>
<td>child2.3</td>
<td>...</td>
</tr>
<tr>
<td>child3.1</td>
<td>child3.2</td>
<td>child3.3</td>
<td>...</td>
</tr>
</tbody>
</table>

Okay, I had a few minutes to spare on my mobile phone before bedtime, so I ran with Wiktor's comment and whacked up a series of preg_ functions to try to convert your bbcode to html. I don't have any experience with bbcode, so I am purely addressing your sample input and not considering fringe cases. I think php has a bbcode parser library somewhere, but I don't know if your bbcode syntax is the standard.
Some break down of the patterns implemented.
First, isolate each whole [table]...[/table] string in the document. (Regex101 Demo) ~\[table]\R*([^[]*(?:\[(?!/?table])[^[]*)*)\R*\[/table]~ will match the strings and pass the fullmatch as $m[0] and the substring between the table tags as $m[1] to BBTableToHTML().
Next, BBTableToHTML() will make 3 separate passes over the $m[1] string. Each of those patterns will send their respective matched strings to the associated custom function and return the modified string.
Before sending the updated $m[1] from BBTableToHTML() back to the echo, your desired <table...> and </table> tags will bookend $m[1].
Demos of the preg_replace_callback_array() patterns:
~\[\*\*]([^[]*(?:\[(?!/?\*\*])[^[]*)*)\[/\*\*]~ https://regex101.com/r/thINHQ/2
~(?:\[\*].*\R*)+~ https://regex101.com/r/thINHQ/3
~\[\*](.*)~ https://regex101.com/r/thINHQ/4
Code: (Demo)
$bbcode = <<<BBCODE
[b]Check out this demo[/b]
¯\_(ツ)_/¯
[table]
[**]header1[||]header2[||]header3[||]...[/**]
[*]child1.1[|]child1.2[|]child1.3[|]...
[*]child2.1[|]child2.2[|]child2.3[|]...
[*]child3.1[|]child3.2[|]child3.3[|]...
[*]...[|]...[|]...[|]...
[/table]
simple text
[table]
[**]a 1[||]and a 2[/**]
[*]A[|]B
[*]C[|]D
[/table]
[s]3, you're out[/s]
blah
BBCODE;
function BBTableToHTML($m) {
return "<table class=\"ui compact stripet yellow table\">\n" .
preg_replace_callback_array(
[
'~\[\*\*]([^[]*(?:\[(?!/?\*\*])[^[]*)*)\[/\*\*]~' => 'BBTHeadToHTML',
'~(?:\[\*].*\R*)+~' => 'BBTBodyToHTML',
'~\[\*](.*)~' => 'BBTBodyRowToHTML'
],
$m[1]
) .
"</table>";
}
function BBTHeadToHTML($m) {
return "\t<thead>\n" .
"\t\t<tr>\n\t\t\t<th>" . str_replace('[||]', "</th>\n\t\t\t<th>", $m[1]) . "</th>\n\t\t</tr>\n" .
"\t</thead>";
}
function BBTBodyToHTML($m) {
return "\t<tbody>\n{$m[0]}\t</tbody>\n";
}
function BBTBodyRowToHTML($m) {
return "\t\t<tr>\n\t\t\t<td>" . str_replace('[|]', "</td>\n\t\t\t<td>", $m[1]) . "</td>\n\t\t</tr>";
}
echo preg_replace_callback(
'~\[table]\R*([^[]*(?:\[(?!/?table])[^[]*)*)\R*\[/table]~',
'BBTableToHTML',
$bbcode
);
Output:
[b]Check out this demo[/b]
¯\_(ツ)_/¯
<table class="ui compact stripet yellow table">
<thead>
<tr>
<th>header1</th>
<th>header2</th>
<th>header3</th>
<th>...</th>
</tr>
</thead>
<tbody>
<tr>
<td>child1.1</td>
<td>child1.2</td>
<td>child1.3</td>
<td>...</td>
</tr>
<tr>
<td>child2.1</td>
<td>child2.2</td>
<td>child2.3</td>
<td>...</td>
</tr>
<tr>
<td>child3.1</td>
<td>child3.2</td>
<td>child3.3</td>
<td>...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
</tbody>
</table>
simple text
<table class="ui compact stripet yellow table">
<thead>
<tr>
<th>a 1</th>
<th>and a 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>A</td>
<td>B</td>
</tr>
<tr>
<td>C</td>
<td>D</td>
</tr>
</tbody>
</table>
[s]3, you're out[/s]
blah

Related

Delete a table and its content from html code in php

I have a html code like below. I want to remove entire table and its content from it using php. I can remove the table tags using PHP strip_tags but i am not sure about deleting the table content. Any help would be appreciated.
<div>
<p> This is test paragraph</p>
<table>
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Age</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>50</td>
</tr>
</table>
</div>
Desired Output is
<div>
<p> This is test paragraph</p>
</div>
Thanks #medigeek and all for your answers, i've made few changes to the code so that it works with inline styles.
Solution:
$html = '<div>
<p> This is test paragraph</p>
<table style="width:100%"> // Note: Inline Styles
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Age</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>50</td>
</tr>
</table>
</div>';
$regex = '/<table[^>]*>.*?<\/table>/s'; // Regular expression pattern
//This Regex pattern even works with tags that contains inline styles
$replace = '';
$result = preg_replace($regex, $replace, $html);
echo($result);

<?php
$teststring = '<div>
<p> This is test paragraph</p>
<table>
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Age</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>50</td>
</tr>
</table>
</div>';
$regexpattern = '/<table>.*?<\/table>/s'; // Matching regular expression pattern
$replacement = ''; // Substitute the matched pattern with an empty string
$res = preg_replace($regexpattern, $replacement, $teststring);
echo($res);
?>
Matching regular expressions pattern
/=start regex pattern
<table> = start matching when you see this text
.* = match anything (any characters or empty) in between
? = but don't be greedy (as in only match
characters between the limits set)
<\/table> = stop matching when you see this text
/ = end regex pattern
s = modifier, keep matching even if you stumble upon new line characters
Regular expressions are powerful in matching otherwise seemingly complicated text strings in different programming languages. You may find more information here:
http://php.net/manual/en/function.preg-replace.php
http://php.net/manual/en/book.pcre.php
https://www.regular-expressions.info/quickstart.html

You can do this with preg_replace:
$your_html = '<table......';
$new_html = preg_replace("/(<table>).*?(<\/table>)/s", "", $your_html);
echo $new_html;
/*
OUTPUT:
<div>
<p> This is test paragraph</p>
</div>
*/
Regards,

If you are wanting to do this after the page has already loaded (page loads and certain event triggers the removal) it can't be done with php. Something like this would have to be done using javascript. If you are wanting it to be removed while the page is loading you're going to have to set the output to be within php.
<div>
<p> This is test paragraph</p>
<?php
if(*CASE FOR LOADING*){
echo "
<table>
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Age</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>50</td>
</tr>
</table>
";
}
?>
</div>
Doing it like this will only display the table when the cause is given through PHP.

Try this:
$content = <<<DATA
<div>
<p> This is test paragraph</p>
<table>
<tr>
<th>Firstname</th>
<th>Lastname</th>
<th>Age</th>
</tr>
<tr>
<td>Jill</td>
<td>Smith</td>
<td>50</td>
</tr>
</table>
</div>
DATA;
$doc = new DOMDocument();
$doc->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$tables = $doc->getElementsByTagName('table');
while ($tables->length)
{
$tables[0]->parentNode->removeChild($tables[0]);
}
echo $doc->saveHTML();
Output:
<div> <p> This is test paragraph</p> </div>

how would i remove some portion of <tr><td> from a table in php?

this is my table -
<table>
<tr>
<td>ABC</td>
</tr>
<tr>
<td> </td>
</tr>
</table>
and I want to remove this one table row:
<tr>
<td> </td>
</tr>
my expected output is:
<table>
<tr>
<td>ABC</td>
</tr>
</table>
is it possible??please help me

As you have tagged your question with the tag php i would recommend using a regular expression.
The pattern \s*<tr>\s*<td> <\/td>\s*<\/tr> will find the tr with an empty ( ) td.
To test and look into the regex you can have a look here: https://regex101.com/r/ax6Xdg/1
Put together this will look something like this:
$table = "<table>
<tr>
<td>ABC</td>
</tr>
<tr>
<td> </td>
</tr>
</table>";
$pattern = "/\s*<tr>\s*<td> <\/td>\s*<\/tr>/";
var_dump( preg_replace( $pattern , "" , $table ) );
This will output something very simmilar to this:
string '<table>
<tr>
<td>ABC</td>
</tr>
</table>' (length=60)

You can do this by using JQuery function .remove(). You can look it up here
Edit: If you want to locate that specific tag, you can do that by using .next()read here, .find() read here,.parent()read here, .children read here

Just add id to your table :
<table id="tableid">
<tr>
<td>ABC</td>
</tr>
<tr>
<td> </td>
</tr>
</table>
This script find , if found then remove !
$('#tableid tr').each(function() {
if ($(this).find('td').html()==' ') $(this).remove();
});
If you want to find some text and then remove then replace html() with text()
$('#tableid tr').each(function() {
if ($(this).find('td').text()=='ABC') $(this).remove();
});

You should try this:
<table>
<tr id="abc>
<td>ABC</td>
</tr>
<tr id="remove">
<td> </td>
</tr>
<script>
$('#remove').remove();
</script>

When rendering the table, add a unique class for the rows you wish to delete. Lets say the class is: _rowToDelete, and then using jQuery, remove all the rows that have this class.
In the below example, when you click on the button the rows are being removed, so you can see the changes. But you can do the same on page load if you wish so.
$(function() {
$("#removeBtn").click(function() {
$("._rowToDelete").remove() ;
});
}) ;
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table>
<tr>
<td>ABC 1</td>
</tr>
<tr class="_rowToDelete">
<td> </td>
</tr>
<tr>
<td>ABC 2</td>
</tr>
<tr class="_rowToDelete">
<td> </td>
</tr>
<tr>
<td>ABC 3</td>
</tr>
<tr class="_rowToDelete">
<td> </td>
</tr>
</table>
Remove

Exclude table unit from the html string if there is selected phrase found

I will try to explain the issue with an example.
Let's say I have big html string which includes following types of table units.
<table id="table-1">
<tbody>
<tr><td><p>{{Phrase 1}}</p></td></tr>
</tbody>
</table>
<table id="table-2">
<tbody>
<tr><td><p>Sample text 1 goes here..</p></td></tr>
</tbody>
</table>
<table id="table-3">
<tbody>
<tr><td><p>{{Phrase 2}}</p></td></tr>
</tbody>
</table>
<table id="table-4">
<tbody>
<tr><td><p>Sample text 2 goes here..</p></td></tr>
</tbody>
</table>
I need PHP function to exclude the complete table from the html string if it contains {{Phrase 1}} or {{Phrase 2}}.
Simply in above example I need to exclude table-1 & table-3 and the result string would be like bellow,
<table id="table-2">
<tbody>
<tr><td><p>Sample text 1 goes here..</p></td></tr>
</tbody>
</table>
<table id="table-4">
<tbody>
<tr><td><p>Sample text 2 goes here..</p></td></tr>
</tbody>
</table>
I tried the preg_replace function but it didn't work as I can just replace the selected text not whole unit.
Can anyone here help me to overcome this issue.
Sample code which I had so far and still trying to develop it.
$patterns = array();
$patterns[0] = '{{Phrase 1}}';
$patterns[1] = '{{Phrase 2}}';
$replacements = array();
$replacements[0] = '';
$replacements[1] = '';
$string = '<table id="table-1">
<tbody>
<tr><td><p>{{Phrase 1}}</p></td></tr>
</tbody>
</table>
<table id="table-2">
<tbody>
<tr><td><p>Sample text 1 goes here..</p></td></tr>
</tbody>
</table>
<table id="table-3">
<tbody>
<tr><td><p>{{Phrase 2}}</p></td></tr>
</tbody>
</table>
<table id="table-4">
<tbody>
<tr><td><p>Sample text 2 goes here..</p></td></tr>
</tbody>
</table>';
echo '<pre>';
echo htmlspecialchars(preg_replace($patterns, $replacements, $string));
echo '</pre>';

If the structure is always the same, then you can do it in a simple regex:
// This regex matches the current structure, no matter what the number for the table id is
// and either Phrase 1 or 2.
$regex = '/(<table id="table-[0-9]+">[\s]+<tbody>[\s]+<tr><td><p>\{\{Phrase (1|2)\}\}<\/p><\/td><\/tr>[\s]+<\/tbody>[\s]+<\/table>)/';
$html = '<table id="table-1">
<tbody>
<tr><td><p>{{Phrase 1}}</p></td></tr>
</tbody>
</table>
<table id="table-2">
<tbody>
<tr><td><p>Sample text 1 goes here..</p></td></tr>
</tbody>
</table>
<table id="table-3">
<tbody>
<tr><td><p>{{Phrase 2}}</p></td></tr>
</tbody>
</table>
<table id="table-4">
<tbody>
<tr><td><p>Sample text 2 goes here..</p></td></tr>
</tbody>
</table>';
// Simply perform a replace with an empty string
$clean = preg_replace($regex, '', $html);
Demo: https://3v4l.org/4QHvm
If you want a more detailed explanation about the regex, you can read more here: https://regex101.com/r/B128DE/1

One very simple way to do it without having to use DOM or (God forbid) regex is to strip tags and explode on three new lines.
Strip tags will remove all HTML and leave blank spaces in it's place.
$html = '<table id="table-1">
<tbody>
<tr><td><p>{{Phrase 1}}</p></td></tr>
</tbody>
</table>
<table id="table-2">
<tbody>
<tr><td><p>Sample text 1 goes here..</p></td></tr>
</tbody>
</table>
<table id="table-3">
<tbody>
<tr><td><p>{{Phrase 2}}</p></td></tr>
</tbody>
</table>
<table id="table-4">
<tbody>
<tr><td><p>Sample text 2 goes here..</p></td></tr>
</tbody>
</table>';
$arr = explode(PHP_EOL.PHP_EOL.PHP_EOL , strip_tags($html));
// Optional output. But the trim is needed so some
// kind of loop is needed to remove the extra spaces
For($i=1; $i<count($arr);){
Echo trim($arr[$i]) . "<Br>\n";
$i = $i+2;
}
https://3v4l.org/gPQZn

How to get a content from table rows as key/value pairs in regex only

I have this table:
<?php
$a ="<table class='table table-condensed'>
<tr>
<td>Monthely rent</td>
<td><strong>Fr. 1'950. </strong></td>
</tr>
<tr>
<td>Rooms(s)</td>
<td><strong>3</strong></td>
</tr>
<tr>
<td>Surface</td>
<td><strong>93m2</strong></td>
</tr>
<tr>
<td>Date of Contract</td>
<td><strong>01.04.17</strong></td>
</tr>
</table>
What I need is to get the value of each <td> inside <tr> as key value pairs as in:
monthly rent => Fr. 1'950.
Rooms(s) => 3
Surface => 93m2
Date of Contract => 01.04.17;
So, far only this code returns some result close to what I need but not like the format I was expecting
preg_match_all("/<td>.*/", $a, $matches);
I am trying to find any improvements on this.

You can use the following regex to get the contents from table rows as key/value pairs :
regex to get keys >> (?<=<td>)(?!<strong>).*?(?=<\/td>)
. . . values >> (?<=<strong>).*?(?=<\/strong>)
see demo
PHP
<?php
$re = '/(?<=<strong>).*?(?=<\/strong>)/';
$str = '<table class=\'table table-condensed\'>
<tr>
<td>Monthly rent</td>
<td><strong>Fr. 1\'950. </strong></td>
</tr>
<tr>
<td>Rooms(s)</td>
<td><strong>3</strong></td>
</tr>
<tr>
<td>Surface</td>
<td><strong>93m2</strong></td>
</tr>
<tr>
<td>Date of Contract</td>
<td><strong>01.04.17</strong></td>
</tr>
</table>';
preg_match_all($re, $str, $matches);
print_r($matches);
?>

How can i get the entire HTML of an element using regex?

i'm learning Regex but can't figure it out.... i want to get the entire HTML from a DIV, how to procced?
already tried this;
/\< td class=\"desc1\"\>(.+)/i
it returns;
Array
(
[0] => < td class="desc1">
[1] =>
)
the code that i'm matching is this;
<table id="profile" cellpadding="1" cellspacing="1">
<thead>
<tr>
<th colspan="2">Jogador TheInFEcT </th>
</tr>
<tr>
<td>Detalhes</td>
<td>Descrição:</td>
</tr>
</thead><tbody>
<tr>
<td class="empty"></td><td class="empty"></td>
</tr>
<tr>
<td class="details">
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<th>Classificação</th>
<td>11056</td>
</tr>
<tr>
<th>Tribo:</th>
<td>Teutões</td>
</tr>
<tr>
<th>Aliança:</th>
<td>-</td>
</tr>
<tr>
<th>Aldeias:</th>
<td>1</td>
</tr>
<tr>
<th>População:</th>
<td>2</td>
</tr><tr>
<td colspan="2" class="empty"></td>
</tr>
<tr>
<td colspan="2"> » Alterar perfil</td>
</tr>
</tbody></table>
</td>
<td class="desc1">
<div>STATUS: OFNAaaaAA</div>
</td>
</tr>
</tbody>
</table>
i need to get the entire code inside the < td class="desc1">, like that;
<div >STATUS: OFNAaaaAA< /div>
</td>
</tr>
</tbody>
</table>
Could someone help me out?
Thanks in advance.

I usually use
$dom = DOMDocument::load($htmldata);
for converting HTML code to XML DOM. And then you can use
$node = $dom->getElementsById($id);
/* or */
$nodes = $dom->getElementsByTagName($tag);
to get your HTML/XML node.
Now, use
$node->textContent
to get data inside node.

try this, it does not cover all possible cases but it should work:
/<td\s+class=['"]\s*desc1\s*['"]\s*>((.|\n)*)<\/td>/i
tested with: http://www.pagecolumn.com/tool/pregtest.htm
edit: improved solution suggested by Alan Moore
/<td\s+class=['"]\s*desc1\s*['"]\s*>(.*?)<\/td>/s

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Why does this regex only mach the last occurence of the pattern - php

Related

Delete a table and its content from html code in php

how would i remove some portion of <tr><td> from a table in php?

Exclude table unit from the html string if there is selected phrase found

How to get a content from table rows as key/value pairs in regex only

How can i get the entire HTML of an element using regex?

Categories

Resources