Regex not giving the results expected - php

So, i got some html tables that i need to extract values, did a regular expression to get the values i wanted.
the html tables can be in these 2 formats:
<td height="20" style="width:59px;height:20px;">1</td>
<td style="width:212px;">Mendes, Paulo [AA]</td>
<td style="width:99px;">39</td>
<td>8</td>
<td style="width:85px;">$10,000</td>
</tr><tr height="20"><td height="20" style="width:59px;height:20px;">2</td>
<td style="width:212px;">Campos, Miguel [AC]</td>
<td style="width:99px;">37</td>
<td>6</td>
<td style="width:85px;">$5,000</td>
And the other one
<td>1</td>
<td>Mendes, Paulo [AA]</td>
<td>39</td>
<td>8</td>
<td>$10,000</td>
</tr><tr height="20"><td>2</td>
<td>Campos, Miguel [AC]</td>
<td>37</td>
<td>6</td>
<td>$5,000</td>
To the example without style i can get the values i want with this regex:
<td>(\d+)<\/td>\n+\t*<td>([\w+, ]+) \[(\w{2})\]<\/td>
its to be used in php, and i been using https://regex101.com/ to test the regex first.
now to get the values of the table with styles i'm getting no luck.
tried the "perfect match" with:
<td height\=\"20\" style\=\"width\:59px\;height\:20px\;\">(\d+)<\/td>\n+\t*<td style\=\"width\:212px\;\">([\w+, ]+) \[(\w{2})\]<\/td>
but it doesn't catch want i want. even tried to do a negation search but it still doesn't work. What i'm doing wrong?

Why don't you use QuerySelectorAll (''); it is a lot easier. You can used it to retrieve the inner text of td elements and store them in an array using a loop. Once you have the td you can use jQuery Ajax to send it to a .php file to process however you want.
For example:
var tdArr = [];
var tdContent = document.querySelectorAll('table tr td');
for ( let i = 0; i < tdContent.length; i++){
tdArr.push(tdContent[i].textContent);
}

Related

PHP regular expression to extract html values

I want to extract values from the code below.
<tbody>
<tr>
<td><div class="file_pdf">note1</div></td>
<td class="textright">110 KB</td>
<td class="textright">106</td>
</tr>
<tr>
<td><div class="file_pdf">note2.pdf</div></td>
<td class="textright">44 KB</td>
<td class="textright">104</td>
</tr>
</tbody>
I want to extract 'note1', 'note2' strings and 1628 and 1629 numbers.
i treid
preg_match_all('~(\'\)\">(.*?)<\/a>)~', $getinside, $matches);
but its result is not what I am looking for..
is there any simple RegEx to extract them? Thanks!
It should work for you:
preg_match_all("~downloadFile\('(\d+)'\)\">([^<]*)</a>~", $getinside, $matches);
Remember: If your html is very large/complex and you also need to parse more other things from there, then regex is not a better option to do this.

simple_html_dom get an input with a partial id value

Im trying to get td fields from a table with the id sid-120390923 or sid-38737122 ignoring any other id value in the table, however the problem is that I can not get these td fields because they all have a slightly different id but with the same start ('sid').
what I want is to be able to grab all the td fields with the starting id sid-, would there be a way to incorporate regex?
here is an example
html
<table>
<tr>
<td id='9sd'></td>
<td id='10sd'></td>
<td id='sid-1239812983'></td>
<td id='sid-1454345345'></td>
<td id='sid-2342345234'></td>
<td id='sid-5656433455'></td>
<td id='sid-1231235664'></td>
<td id='sid-8986757575'></td>
<td id='sid-1232134551'></td>
</tr>
</table>
simple_html_dom
$table = $con->find('table');
foreach($table->find('td#sid-1232134551') as $field){
echo $field."<br>";
}
You can use attribute filters (the 5. tab),
$table->find('td[id^=sid-]');

Displaying the text in the multiple lines when retrieving from database

Hi
I have a table in which my row contains the text which i retrieve from the database.But i have a small width of row and the data i retrieve is large.And the text exceeds the width of my row so i want to break the data i retrieve into multi lines inside the table row.How can i do it.
My code is here:
$list = $mfidao1->fetchMfi($_GET['id']);
//print_r($list);
//die;
if(!empty($list))
{
foreach($list as $menu)
{
?>
<tr style="border:none; background-color:#FBFBFB;" >
<td class="topv">Social Mission</td>
<td class="topm" ><div class="txt"><?php echo $menu->mfi_1_a;?></div></td>
</tr>
<tr bgcolor="#E8E8E8">
<td class="topv">Address</td>
<td class="topm"><?php echo $menu->mfi_ii_c;?></td>
</tr>
<tr bgcolor="#FBFBFB">
<td class="topv">Phone</td>
<td class="topm"><?php echo $menu->mfi_ii_e;?></td>
</tr>
<tr bgcolor="#E8E8E8">
<td class="topv">Email</td>
<td class="topm"><?php echo $menu->mfi_ii_d;?></td>
</tr>
<tr bgcolor="#FBFBFB">
<td class="topv">Year Established</td>
<td class="topm"><?php echo $menu->mfi_i_c;?></td>
</tr>
<tr bgcolor="#E8E8E8">
<td class="topv">Current Legal Status</td>
<td class="topm"><?php echo $menu->mfi_i_d;?></td>
</tr>
<tr bgcolor="#FBFBFB">
<td class="topv">Respondent</td>
<td class="topm"><?php echo $menu->mfi_ii_a;?></td>
</tr>
<?php
}
}
?>
</table>
Set width of <td>. I think this is the best way to do this rather than word_wrap().
In your css for the table, use "table-layout:fixed" - This fixes the td elements width according to the way you want.
" word-wrap: break-word; " - this breaks the text in it so that it doesnt go beyond the boundary of the box.
You need to wrap the text in your td tags. Here is a link to a similar question
You could use the function wordwrap().
It wraps a string to a given number of characters using a string break character.
you can either use the php function
php wordwrap
or styling the td with css so that it uses the word-wrap attribute
css wordwrap
Not sure if this is what you want, but sound like you could use chunk_split()

How Can I Get Data From HTML Source Code with PHP and RegEx?

I have got HTML source code, and i must get some information text in the HTML. I can not use DOM, because the document isn't well-formed.
Maybe, the source could change later, I can not be aware of this situation. So, the solution of this problem must be advisible for most situation.
Im getting source with curl, and i will edit it with preg_match_all function and regular expressions.
Source :
...
<TR Class="Head1">
<TD width="15%"><font size="12">Name</font></TD>
<TD>: </TD>
<TD align="center"><font color="red">Alex</font></TD>
<TD width="25%"><b>Job</b></TD>
<TD>: </B></TD>
<TD align="center" width="25%"><font color="red">Doctor</font></TD>
</TR>
...
...
<TR Class="Head2">
<TD width="15%" align="left">Age</B></TD>
<TD>: </TD>
<TD align="center"><font color="red">32</font></TD>
<TD width="15%"><font size="10">data</TD></font>
<TD> </B></TD>
<TD width="40%"> </TD>
</TR>
...
As we have seen, the source is not well-formed. In fact, terrible! But there is nothing I can do.
The source is longer than this.
How can I get the data from the source? I can delete all of HTML codes, but how can i know sequence of data? What can I do with preg_match_all and regex? What else can I do?
Im waiting for your help.
If you can use the DOM this is far better than regexes. Take a look a PHP Tidy - it's designed to manage badly formed HTML.
You can use DOMDocument to load badly formed HTML:
$doc = new DOMDocument();
#$doc->loadHTML('<TR Class="Head2">
<TD width="15%" align="left">Age</B></TD>
<TD>: </TD>
<TD align="center"><font color="red">32</font></TD>
<TD width="15%"><font size="10">data</TD></font>
<TD> </B></TD>
<TD width="40%"> </TD>
</TR>');
$tds = #$doc->getElementsByTagName('td');
foreach ($tds as $td) {
echo $td->textContent, "\n";
}
I'm suppressing warnings in the above code for brevity.
Output:
Age
:
32
data
<!-- space -->
<!-- space -->
Using regex to parse HTML can be a futile effort as HTML is not a regular language.
Don't use RegEx. The link is funny but not informative, so the long and short of it is that HTML markup is not a regular language, hence cannot be parsed simply using regular expressions.
You could use RegEx to parse individual 'tokens' ( a single open tag; a single attribute name or value...) as part of a recursive parsing algorithm, but you cannot use a magic RegEx to parse HTML all on its own.
Or you could use a parser.
Since the markup isn't valid, maybe you could use TagSoup or PHP:Tidy.
$regex = <<<EOF
<TR Class="Head2">\s+<TD width="15%" align="left">Age</B></TD>\s+<TD>: </TD>\s+<TD align="center"><font color="red">(\d+)</font></TD>\s+<TD width="15%"><font size="10">(\w+)</TD></font>\s+<TD> </B></TD>\s+<TD width="40%"> </TD>\s+</TR>
EOF;
preg_match_all($regex, $text, $result);
var_dump($result)

Zebra Striping with PHPTAL?

I'm trying out PHPTAL and I want to render a table with zebra stripes. I'm looping through a simple php assoc array ($_SERVER).
Note that I don't want to use jQuery or anything like that, I'm trying to learn PHPTAL usage!
Currently I have it working like this (too verbose for my liking):
<tr tal:repeat="item server">
<td tal:condition="repeat/item/odd" tal:content="repeat/item/key" class="odd">item key</td>
<td tal:condition="repeat/item/even" tal:content="repeat/item/key" class="even">item key</td>
<td tal:condition="repeat/item/odd" tal:content="item" class="odd">item value</td>
<td tal:condition="repeat/item/even" tal:content="item" class="even">item value</td>
</tr>
Basically I want some kind of conditional assignment on the fly, but I'm unsure of the syntax.
You could create expression modifier by writing phptal_tales_evenodd() function (see phptal_tales() in manual):
<td tal:attributes="class evenodd:repeat/item/odd">
Well, it seems like I have my own answer, though I still think this is rather ugly:
<tr tal:repeat="item server">
<td tal:content="repeat/item/key" tal:attributes="class php: repeat.item.odd ? 'odd' : 'even'">item key</td>
<td tal:content="item" tal:attributes="class php: repeat.item.odd ? 'odd' : 'even'">item value</td>
</tr>
Anyone got anything more graceful looking for PHPTAL?

Categories