How to use htmlspecialchars in preg_match - php

This is my code, it will echo "Not working"
$f = file_get_contents("http://www.google.com");
$text = htmlspecialchars( $f );
$matches = array();
preg_match('#<a.*?</a>#s', $text, $matches);
if ($matches) {
$text2 = $matches[0];
echo $text2;
}
else {
echo "Not working";
}
If I made a variable:
$text = 'Google is your best friend!';
This will work somehow, but it wont when I take it from the:
$text = htmlspecialchars( $f );
Anyone knows why?

This is because htmlspecialchars translates all special characters <&>"', etc into html entities (e.g., & becomes &). Thus your match fails.

htmlspecialchars will convert from
<
to
<
etc.
See the manual.

Related

how to cut tag in string with PHP

I Have a problem with this output receive value.
$simple="<TRAN_ID>17564_36428.1354_4159</TRAN_ID>
<TRAN_DATE>20160201</TRAN_DATE>
<TRAN_TIME>10:07:08</TRAN_TIME>
<ERROR_CODE>1</ERROR_CODE>
<ERROR_DESC>Not Input Policy</ERROR_DESC>
<POLICY_NBR></POLICY_NBR>";
I want to cut the code with PHP.
TRAN_ID = ?
TRAND_DATE = ?
ERROR_CODE = ?
ERROR_DESC = ?
How can i do it. sorry my english is bad.
Thanks.
You can use PHP's SimpleXML library, like so:
<?php
$str ="<TRANS><TRAN_ID>17564_36428.1354_4159</TRAN_ID><TRAN_DATE>20160201</TRAN_DATE><TRAN_TIME>10:07:08</TRAN_TIME><ERROR_CODE>1</ERROR_CODE><ERROR_DESC>Not Input Policy</ERROR_DESC><POLICY_NBR></POLICY_NBR></TRANS>";
$transaction = simplexml_load_string($str);
echo $transaction->TRAN_ID.PHP_EOL;
echo $transaction->TRAN_DATE.PHP_EOL;
echo $transaction->TRAN_TIME.PHP_EOL;
echo $transaction->ERROR_CODE.PHP_EOL;
echo $transaction->ERROR_DESC.PHP_EOL;
echo $transaction->POLICY_NBR.PHP_EOL;
Note that I added <TRANS> start and end tags to your string.
If the data always looks like the sample, this should work okay.
<?php
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}
$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>

How can I split html value and normal string into different array in php?

Say I have string such as below:
"b<a=2<sup>2</sup>"
Actually its a formula. I need to display this formula on webpage but after b string is hiding because its considered as broken anchor tag. I tried with htmlspecialchars method but it returns complete string as plain text. I am trying with some regex but I can get only text between some tags.
UPDATE:
This seems to work with this formula:
"(c<a) = (b<a) = 2<sup>2</sup>"
And even with this formula:
"b<a=2<sup>2</sup>"
HERE'S THE MAGIC:
<?php
$_string = "b<a=2<sup>2</sup>";
$string = "(c<a) = (b<a) = 2<sup>2</sup>";
$open_sup = strpos($string,"<sup>");
$close_sup = strpos($string,"</sup>");
$chars_array = str_split($string);
foreach($chars_array as $index => $char)
{
if($index != $open_sup && $index != $close_sup)
{
if($char == "<")
{
echo "<";
}
else{
echo $char;
}
}
else{
echo $char;
}
}
OLD SOLUTION (DOESN'T WORK)
Maybe this can help:
I've tried to backslash chars, but it doesn't work as expected.
Then i've tried this one:
<?php
$string = "b&lta=2<sup>2</sup>";
echo $string;
?>
Using &lt html entity it seems to work if i understood your problem...
Let me know
Probably you can give spaces such as :
b < a = 2<sup>2</sup>
It does not disappear the tag and looks much more understanding....
You could try this regex approach, which should skip elements.
$regex = '/<(.*?)\h*.*>.+<\/\1>(*SKIP)(*FAIL)|(<|>)/';
$string = 'b<a=2<sup>2</sup>';
$string = preg_replace_callback($regex, function($match) {
return htmlentities($match[2]);
}, $string);
echo $string;
Output:
b<a=2<sup>2</sup>
PHP Demo: https://eval.in/507605
Regex101: https://regex101.com/r/kD0iM0/1

PHP Regex expression excluding <pre> tag

I am using a WordPress plugin named Acronyms (https://wordpress.org/plugins/acronyms/). This plugin replaces acronyms with their description. It uses a PHP PREG_REPLACE function.
The issue is that it replaces the acronyms contained in a <pre> tag, which I use to present a source code.
Could you modify this expression so that it won't replace acronyms contained inside <pre> tags (not only directly, but in any moment)? Is it possible?
The PHP code is:
$text = preg_replace(
"|(?!<[^<>]*?)(?<![?.&])\b$acronym\b(?!:)(?![^<>]*?>)|msU"
, "<acronym title=\"$fulltext\">$acronym</acronym>"
, $text
);
You can use a PCRE SKIP/FAIL regex trick (also works in PHP) to tell the regex engine to only match something if it is not inside some delimiters:
(?s)<pre[^<]*>.*?<\/pre>(*SKIP)(*F)|\b$acronym\b
This means: skip all substrings starting with <pre> and ending with </pre>, and only then match $acronym as a whole word.
See demo on regex101.com
Here is a sample PHP demo:
<?php
$acronym = "ASCII";
$fulltext = "American Standard Code for Information Interchange";
$re = "/(?s)<pre[^<]*>.*?<\\/pre>(*SKIP)(*F)|\\b$acronym\\b/";
$str = "<pre>ASCII\nSometext\nMoretext</pre>More text \nASCII\nMore text<pre>More\nlines\nASCII\nlines</pre>";
$subst = "<acronym title=\"$fulltext\">$acronym</acronym>";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
<pre>ASCII</pre><acronym title="American Standard Code for Information Interchange">ASCII</acronym><pre>ASCII</pre>
It is also possible to use preg_split and keep the code block as a group, only replace the non-code block part then combine it back as a complete string:
function replace($s) {
return str_replace('"', '"', $s); // do something with `$s`
}
$text = 'Your text goes here...';
$parts = preg_split('#(<\/?[-:\w]+(?:\s[^<>]+?)?>)#', $text, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$text = "";
$x = 0;
foreach ($parts as $v) {
if (trim($v) === "") {
$text .= $v;
continue;
}
if ($v[0] === '<' && substr($v, -1) === '>') {
if (preg_match('#^<(\/)?(?:code|pre)(?:\s[^<>]+?)?>$#', $v, $m)) {
$x = isset($m[1]) && $m[1] === '/' ? 0 : 1;
}
$text .= $v; // this is a HTML tag…
} else {
$text .= !$x ? replace($v) : $v; // process or skip…
}
}
return $text;
Taken from here.

Create and simplify links from body of text in PHP

I need to extract all links in a body of text in php and make them clickable. The problem is I can't seem to simplify the text of the link in any way.
I tried using preg_replace_callback but I can't seem to get the trimming function working properly:
function trimUrl($url){
$maxLength = 3;
if(strlen($url)>$maxLength){
$urlShort = substr($str,0,$maxLength).'...';
}
else{
$urlShort = $url;
}
return $urlShort;
}
function enableLinks($text){
return preg_replace_callback("!(((f|ht)tp(s)?://)[-a-zA-Zа-яА-Я()0-9#:%_+.~#?&;//=]+)!i", "<a href='$1' target='_blank'>".trimUrl("$1")."</a>", $text);
}
enableLinks("Visit more work at http://www.google.com");
How can I run a second function within the preg_replace_callback that trims the output text?
What if you used a function inside that function. So if the first function evaluates to true then run this next function? And also try using preg_replace_callback in a variable format so its easier to work with
First, you are using substring(). Where have you defined the variable $str? And, if you do this:
$var = preg_replace_callback("!(((f|ht)tp(s)?://)[-a-zA-Zа-яА-Я()0-9#:%_+.~#?&;//=]+)!i", "<a href='$1' target='_blank'>".trimUrl("$1")."</a>", $text);
Than can you use a new function:
return function($var);
Ended up using a more expanded function to achieve this, works on multiple urls with or without "http://":
function trimUrlOutput($url){
$maxLength = 30;
if(strlen($url)>$maxLength){
$urlShort = substr($url,0,$maxLength).'...';
}
else{
$urlShort = $url;
}
return $urlShort;
}
function enableLinks($text){
$text = ereg_replace( "www\.", "http://www.", $text );
$text = ereg_replace( "http://http://www\.", "http://www.", $text );
$text = ereg_replace( "https://http://www\.", "https://www.", $text );
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
if(preg_match_all($reg_exUrl, $text, $url)) {
$matches = array_unique($url[0]);
foreach($matches as $match) {
$linkText = trimUrlOutput($match);
$replacement = "<a href=".$match." target='_blank'>{$linkText}</a>";
$text = str_replace($match,$replacement,$text);
}
return $text;
}
else{
return $text;
}
}
enableLinks("Visit more work at http://www.google.com");
Hope this helps someone.

Extract url from string via preg match in php

$str = 'window.location.href="http://my-site.com";'
I want to extract the url from $str. I am not that good in preg_match(). However with the following code:
preg_match('/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/', $str, $link);
if (empty($link[0])) {
echo "Nothing found!";
} else {
echo $link[0];
}
I am able to get the result http://my-site.com";. I want to customize preg_match() to exclude "; from the result. Please help!
<?php
$str = 'window.location.href="http://my-site.com";';
preg_match('/window\.location\.href="(.*?)";/', $str, $result);
echo $result[1];
//http://my-site.com
>?
http://ideone.com/YTk70i
If you dont feel comfortable with preg_* then try keeping it simple. It seems a bit of an unnecessary overhead loading the regex engine anyway for something that simple.
Try this instead :-
$str = 'window.location.href="http://my-site.com";';
$p1 = strpos($str, 'href="') + strlen('href="');
$p2 = strpos($str, '";', $p1);
$url = substr($str,$p1,$p2-$p1);
echo $p1 .PHP_EOL;
echo $p2 .PHP_EOL;
echo $url;
This yeilds the following
22
40
http://my-site.com
i.e everything between href=" and ";
Try this:
preg_match('/^window.location.href="([^"]+)";$/', $str, $link);

Categories