replacing a string and keeping numbers intact - php

i got this table generated with php:
a function generates a string with all the html code:
<table><tr><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td><td>7</td><td>8</td><td>9</td><td>10</td></tr><tr><td>2</td><td>4</td><td>6</td><td>8</td><td>10</td><td>12</td><td>14</td><td>16</td><td>18</td><td>20</td></tr><tr><td>3</td><td>6</td><td>9</td><td>12</td><td>15</td><td>18</td><td>21</td><td>24</td><td>27</td><td>30</td></tr><tr><td>4</td><td>8</td><td>12</td><td>16</td> .... </table>
now i want to make the numbers 1 to 10 black. i'm trying to replace '<td>(10|[0-9])</td>' with <td style="font-weight: bold">THE-ORIGINAL-NUMBER</td>.
Thanx in advance!
p.s. i know there're alot of similir answers out there but i just couldnt figure it out.. is there an actually noob-friendly tut/glossary of regex out there? i couldn't really find a modern day site.

If you are matching this regular expression:
<td>(10|[0-9])</td>
You are capturing 10|[0-9] into capture group #1. This can be referenced in your replacement with either of the following backreferences:
\1
$1
Full PHP code:
$html = '<td>1</td>';
$html = preg_replace(
'~<td>(10|[0-9])</td>~',
'<td style="font-weight: bold">\1</td>',
$html
);

use this regex
(?<=<td>)(10|[0-9])(?=<\/td>)
replace group #1 with:
<span class="BoldText">$1</span>
Style:
.BoldText {
font-weight: bold;
}

using <b> may be useful:
replace
'~<td>(10|[0-9])</td>~'
with
'<td><b>\1</b></td>'

Related

how to do echo from a string, only from values that are between a specific stretch[href tag] of the string?

[PHP]I have a variable for storing strings (a BIIGGG page source code as string), I want to echo only interesting strings (that I need to extract to use in a project, dozens of them), and they are inside the quotation marks of the tag
but I just want to capture the values that start with the letter: N (news)
[<a href="/news7044449/exclusive_news_sunday_"]
<a href="/n[ews7044449/exclusive_news_sunday_]"
that is, I think you will have to work with match using: [a href="/n]
how to do that to define that the echo will delete all the texts of the variable, showing only:
note that there are other hrefs tags with values that start with other letters, such as the letter 'P' : href="/profiles... (This does not interest me.)
$string = '</div><span class="news-hd-mark">HD</span></div><p>exclusive_news_sunday_</p><p class="metadata"><span class="bg">Czech AV<span class="mobile-hide"> - 5.4M Views</span>
- <span class="duration">7 min</span></span></p></div><script>xv.thumbs.preparenews(7044449);</script>
<div id="news_31720715" class="thumb-block "><div class="thumb-inside"><div class="thumb"><a href="/news31720715/my_sister_running_every_single_morning"><img src="https://static-hw.xnewss.com/img/lightbox/lightbox-blank.gif"';
I imagine something like this:
$removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n = ('/something regex expresion I think /' or preg_match, substring?);
echo $string = str_replace($removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n,'',$string);
expected output: /news7044449/exclusive_news_sunday_
NOTE: it is not essential to be through a variable, it can be from a .txt file the place where the extracts will be extracted, and not necessarily a variable.
thanks.
I believe this will help her.
<?php
$source = file_get_contents("code.html");
preg_match_all("/<a href=\"(\/n(?:.+?))\"[^>]*>/", $source, $results);
var_export( end($results) );
Step by Step Regex:
Regex Demo
Regex Debugger
To get just the links out of the $results array from Valdeir's answer:
foreach ($results as $r) {
echo $r;
// alt: to display them with an HTML break tag after each one
echo $r."<br>\n";
}

preg_replace a hashtag that doesn't end with ;

I am currently using preg_replace to replace hashtags mentioned with html links like shown below. The issue is there is a possibility there will be html code as well being checked. So some css such as color: #000000; will force it to try convert that hex code into a link.
I basically need my regex to ignore doing any preg_replace if the last letter of a word is ;. Here's what I currently have:
$str = preg_replace('/#([a-zA-Z0-9!_%]+)/', '#$1', $str);
Example input: 'I like #action movies!'
Expected output: I like #action movies!'
I cannot use the end of the string to check this as chunks of text is checked at any given time so the string supplied could be #computer text text text #computer for instance.
Appreciate any assistance.
In your regex you can check if next to your hashtag there is a ;, non alphanumeric, end of line or end of string:
/#([a-zA-Z0-9!_%]+)([^;\w]{1}|$)/
Then use $1 and $2 accordingly
'#$1$2'
Your code will look like
$str = preg_replace('/#([a-zA-Z0-9!_%]+)([^;\w]{1}|$)/', '#$1$2',$str);
Here you can see some tests: https://regex101.com/r/yN4tJ6/65
Until a regEx guru come to your rescue (if ever...) and because you are in PHP; here is a solution with few lines of code.
$str="hi #def; #abc #ghi"; // just a test case (first one need be skipped)
if (preg_match_all('/#([a-zA-Z0-9!_%]+.?)/', $str,$m)){
foreach($m[1] as $k) if(substr($k,-1)!=';') {
$k=trim($k);
$str=str_replace("#$k","<a href='http://wxample.com/tags/$k'>#$k</a>",$str);
}
}
print "$str\n";
you can add a condition to check last string is ; or not and use it accordingly .
Example :
if (substr($str, -1)==';'){
//do nothing
}
else {
$str = preg_replace('/#([a-zA-Z0-9!_%]+)/', '#$1', $str);
}
Hope this help .
This regex should work:
#([\w!%]+(?=[\s,!?.\n]|$))
Demo: https://regex101.com/r/KrRiD3/2
Your PHP code:
$str = 'I like #strategy games #f1f1f1; #e2e2e2; #action games!';
$str = preg_replace('/#([\w!%]+(?=[\s,!?.\n]|$))/', '#$1', $str);
echo $str;
output:
I like #strategy games #f1f1f1; #e2e2e2; #action games!
Well, You can use below code, Actually I am new to regex so it is not that professional but it works, here it is
$data = "<p style='color:#00000;'>Heloo</p> #computer text text text #computer #say #goo1d #sd! #say_hello";
echo preg_replace("/(?<!\:)(\s+)\#([\w]+)(?!\;)/",'#$2',$data);
This expression I have use
/(?<!\:)(\s+)\#([\w]+)(?!\;)/
Output is
<p style='color:#00000;'>Heloo</p> #computer text text text #computer #say #goo1d #sd! #say_hello
I hope it helps someone.

php remove whitespace between 2 tags [duplicate]

This question already has answers here:
PHP's preg_replace regex that matches multiple lines
(2 answers)
Closed 7 years ago.
I am trying to remove signature of an email before inserting the message into a database. The signature is enclosed in a special tag, xxx to help strip out.
The following only works if the signature is condensed without whitespace spread over various lines.
$msgeBody = preg_replace('#(<signature>).*?(</signature>)#', '$1$2', $msgeBody);
I have tried possibilities found online to remove whitespace first between these tags, before applying the line above. But no success. How to do? Here is the sample text spread over lines:-
<signature><p><span style="font-weight: bold;">Gerald Sugan</span><br>
Travel Consultant<br>
<span style="font-size: 18px; font-family: 'Courier New'; font-weight: bold;">Sugan Enterprises Inc</span></p>
</signature>
The solution of php preg_replace regex that matches multiple lines is not a duplicate. I could not see how to apply those solutions here. The solution found below is different I think.
You can use DOMDocument:
$mail= <<<'EOD'
<body>
blah blah blah
<signature><p><span style="font-weight: bold;">Gerald Sugan</span><br>
Travel Consultant<br>
<span style="font-size: 18px; font-family: 'Courier New'; font-weight: bold;">Sugan Enterprises Inc</span></p>
</signature>
blah blah blah
</body>
EOD;
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($mail, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach ($dom->getElementsByTagName('signature') as $node) {
$node->parentNode->removeChild($node);
}
echo $dom->saveHTML();
Here is a simple regex that match your signature : <signature>[\S\s]*<\/signature>
\S : Matches anything other than a space, tab or newline.
\s : Matches any space, tab or newline character.
* : Matches zero or more consecutive characters.
Try use Trim() /Function that remove the whitespaces or a caracter which you specified/:
http://www.w3schools.com/php/func_string_trim.asp
Explode would separate the signature from the email body and is quite a short piece of code but you would need to get rid of the last left-over tag.
To answer the original query chop($yourString, ' ' ) should remove all the whitespaces inside $yourString Reference: http://php.net/manual/en/function.chop.php
Your email is held in a variable called $msgeBodyso split it at "signature" and trim off the remaining tag.
$msgeBody = explode("signature", $msgeBody);
$msgeBody = rtrim($msgeBody[0], "<");
Clean up $msgeBody before putting it in your database.
Using $msgeBody = explode("signature", $msgeBody); leaves the first < from "signature" on the end of the first part - the body of the email - which would be in array position $msgeBody[0].
str_replace('<','', $msgeBody[0]); would also remove the tag but if you have other tags in $msgeBody it would remove those too.
rtrim($msgeBody[0], "<"); should remove it better.
substr() also has possibilities http://php.net/manual/en/function.substr.php and would find the first occurrence of ''
rtrim($msgeBody,'<signature>'); might also chop it off but with Mariano's caveat about multiple signatures. Not tested.
strip_tags($msgeBody, ''); will get rid of all the tags in case that could be used. (You put any tags you want to keep in the '' - as in '<br />' for example.)

PHP Regular Expression to convert html entities to their respective characters

I want to change
<lang class='brush:xhtml'>test</lang>
to
<pre class='brush:xhtml'>test</pre>
my code like that.
<?php
$content="<lang class='brush:xhtml'>test</lang>";
$pattern=array();
$replace=array();
$pattern[0]="/<lang class=([A-Za-z='\":])* </";
$replace[0]="<pre $1>";
$pattern[1]="/<lang>/";
$replace[1]="</pre>";
echo preg_replace($pattern, $replace,$content);
?>
but it's not working. How to change my code or something wrong in my code ?
There's quite a few problems:
Pattern 0 has the * outside the group, so the group only matches one character
Pattern 0 doesn't include the class= in the group, and the replacement doesn't have it either, so there won't be a class= in the replaced string
Pattern 0 has a space after the class, but there isn't one in the content string
Pattern 1 looks for lang instead of /lang
This will work:
$pattern[0]="/<lang (class=[A-Za-z='\":]*) ?>/";
$replace[0]="<pre $1>";
$pattern[1]="/<\/lang>/";
$replace[1]="</pre>";
How bout without regex? :)
<?php
$content="<lang class='brush:xhtml'>test</lang>";
$content = html_entity_decode($content);
$content = str_replace('lang','pre',$content);
echo $content;
?>
Using preg_replace is a lot faster than str_replace.
$str = preg_replace("/<lang class=([A-Za-z'\":]+)>(.*?)<\/lang>/", "<pre class=$1>$2</pre>", $str);
Execution time: 0.039815s
[preg_replace]
Time: 0.009518s (23.9%)
[str_replace]
Time: 0.030297s (76.1%)
Test Comparison:
[preg_replace]
compared with.........str_replace 218.31% faster
So preg_replace is 218.31% faster than the str_replace method mentioned above. Each tested 1000 times.

php preg_match_all html dates with slashes error

I've trying to preg_match_all a date with slashes in it sitting between 2 html tags; however its returning null.
here is the html:
> <td width='40%' align='right'class='SmallDimmedText'>Last Login: 11/14/2009</td>
Here is my preg_match_all() code
preg_match_all('/<td width=\'40%\' align=\'right\' class=\'SmallDimmedText\'>Last([a-zA-Z0-9\s\.\-\',]*)<\/td>/', $h, $table_content, PREG_PATTERN_ORDER);
where $h is the html above.
what am i doing wrong?
thanks in advance
It (from a quick glance) is because you are trying to match:
Last Login: 11/14/2009
With this regex:
Last([a-zA-Z0-9\s\.\-\',]*)
The regex doesn't contain the required characters of : and / which are included in the text string. Changing the required part of the regex to:
Last([a-zA-Z0-9\s\.\-\',:/]*)
Gives a match
Would it be better to simply use a DOM parser, and then preform the regex on the result of the DOM lookup? It makes for nicer regex...
EDIT
The other issue is that your HTML is:
...40%' align='right'class='SmallDimmedText'>...
Where there is no space between align='right' and class='SmallDimmedText'
However your regex for that section is:
...40%\' align=\'right\' class=\'SmallDimmedText\'>...
Where it is indicated there is a space.
Use a DOM Parser It will save you more headaches caused by subtle bugs than you can count.
Just to give you an idea on how simple it is to parse using Simple HTML DOM.
$html = str_get_html(...);
$elems = $html->find('.SmallDimmedText');
if ( count($elems->children()) != 1 ){
throw new Exception('Too many/few elements found');
}
$text = $elems->children(0)->plaintext;
//parsing here is only an example, but you have removed all
//the html so that any regex used is really simple.
$date = substr($text, strlen('Last Login: '));
$unixTime = strtotime($date);
I see at least two problems :
in your HTML string, there is no space between 'right' and class=, and there is one space there in your regex
you must add at least these 3 characters to the list of matched characters, between the [] :
':' (there is one between "Login" and the date),
' ' (there are spaces between "Last" and "Login", and between ":" and the date),
and '/' (between the date parts)
With this code, it seems to work better :
$h = "<td width='40%' align='right'class='SmallDimmedText'>Last Login: 11/14/2009</td>";
if (preg_match_all("#<td width='40%' align='right'class='SmallDimmedText'>Last([a-zA-Z0-9\s\.\-',: /]*)<\/td>#",
$h, $table_content, PREG_PATTERN_ORDER)) {
var_dump($table_content);
}
I get this output :
array
0 =>
array
0 => string '<td width='40%' align='right'class='SmallDimmedText'>Last Login: 11/14/2009</td>' (length=80)
1 =>
array
0 => string ' Login: 11/14/2009' (length=18)
Note I have also used :
# as a regex delimiter, to avoid having to escape slashes
" as a string delimiter, to avoid having to escape single quotes
My first suggestion would be to minimize the amount of text you have in the preg_match_all, why not just do between a ">" and a "<"? Second, I'd end up writing the regex like this, not sure if it helps:
/>.*[0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}</
That will look for the end of one tag, then any character, then a date, then the beginning of another tag.
I agree with Yacoby.
At the very least, remove all reference to any of the HTML specific and simply make the regex
preg_match_all('#Last Login: ([\d+/?]+)#', ...

Categories