PHP highlight query and escape html special characters

PHP highlight query and escape html special characters - php

I'm trying to program a search function that hightlights the search query in the result. At the moment I'm using this Code $hightlight = preg_replace('/'.strtolower($query).'/', '<span class=hightlight>'.strtolower($query).'</span>', strtolower($text)); for highlighting, which works fine. The text I'm searching in is a string from a database. The problem now is if the text contains some html special characters, and is for example <test> and the user searches for <te I get the following result: <span class="hightlight"><te< span="">st></te<></span> which is interpretated as st>. This makes sense, but I don't want this. I want <test> as result with <te highlighted. So I need to escape the special characters. I know that there is the function htmlspecialchars, but how can I use it in this case? Or another function? I can't escape them before searching, because than I'm also searching in the HTML-Codes. I also can't escape them after searching, because than are the <span> Tags in the text and they will also be converted to HTML-Codes. I hope you understand my problem. Has anyone a solution for that?

Using a combination of htmlspecialchars() and a regex negative lookahead, I think we're able to solve this.
<php
$text = "this is just my really basic <test> of words";
$query = "<te";
$text = htmlspecialchars($text);
$query = htmlspecialchars($query);
$highlight = preg_replace('/'.strtolower($query).'(?![^\&]*\;)/', '<span class=highlight>'.strtolower($query).'</span>', strtolower($text));
echo $highlight;
?>
(small note, I took the liberty of changing hightlight to highlight)
DEMO
The part of this that solves the issue mentioned in your comment is the negative lookahead: (?![^\&]*\;)
That basically means anything not between & and ;.
Now, this could obviously run into issues in some edge cases where & and ; are both part of the actual text. If you're not doing any sort of text and query limitation/sanitation, I'm not sure that there's anything that will work for all possible cases.

Related

Needed Help About Regex

I have long struggled with programming languages such as PHP, Javascript, HTML, etc. But my weakness is still very disturbing is about regex.
Previously I felt comfortable without understanding it but now I find the point where I have to use a regex function.
I want to replace a html tag that is created from a rich text editor, say [RTE] so that when I type [code] in the box and then I hit enter it will be translated by RTE <div>[code]</div>
What I need is to change the <div>[code]</div> into an opening html tag <div class="code">
I have tried using str_replace() PHP function as bellow :
$content = str_replace(
'<div>[code]</div>',
'<div class="code">',
$_POST['content']
);
but it's not work, I think maybe I need to use preg_replace() function but I can't.
Can someone help me what type the sample code to do that?

In preg_replace() function, you need to escape [,] symbols, so that it would match the literal [,] symbols.
Regex:
<(div)>\[([^\]]*)\]<\/\1>
REplacement string:
<\1 class="\2">
DEMO

PHP regex replace between wordpress shortcode tag

I have a shortcode which I want to be able to strip away depending on the context of the post. Eg.
[tooltip slug="test"]Test Text[/tooltip]
I would like the output to be:
<span class="dummy">Test Text</span>
I have experimented (a lot!) with preg_replace and I can't seem to get it to recognize that the replacement string is between the ']' and then delimited by '[/tooltip]' without doing multiple passes.
Ideas?
Update: As so often happens, about 10 seconds after I wrote this one of my attempts seemed to work. I don't think it's as good as the solution below but FWIW...
$my_var .= preg_replace('/(?:\[tooltip slug=\"([^\"]*)"[^\>]*\]([^\<]*)\[\/tooltip\])/', '<span class="dummy">\\2</span>', $my_post->post_content);

Here is the simple regex you are looking for.
$result = preg_replace('%\[tooltip slug="[^"]*"]([^[]*)\[/tooltip]%',
'<span class="dummy">\1</span>', $subject);
What we do here is capture the text between the tooltip tags, and insert it in the replacement.
Let me know if you need any details.

$test = preg_match('/\[([^\]]+)\]([^\[]+)\[/', '[tooltip slug="test"]Test Text[/tooltip]', $matches);
echo $matches[2];

Is there a typo in this str_replace code? / Am I reading it correctly?

Here is the line of code from a PHP file, specifically it is from zstore.php which is a file include as part of the "Zazzle Store Builder" toolset from Zazzle.com
The set of files allows someone like me, who has products for sale on Zazzle and massage that data into a nicer "storefront" which I can set up my way instead of being confined by the CMS structure of Zazzle.com where they understandably want to keep the monkeys (uhmmm... users like myself) from causing too much mayhem.
So... here is the code:
$keywords = str_replace(" ",",",str_replace(",","",$keywords));
Two questions:
Am I understanding what it does and
Is there an extra single or double quote in the string that does not need to be there?
Here is what I think the line of code is saying:
Take the string of characters that the user inputs (dance diva) and assign it to the variable called
$keywords
then run the following function on that character string
= str_replace
(" ","," <<< look for spaces. If you find a space, replace it with a comma
,str_replace(",","" <<< this is the bit I don't understand or which may have a typo
I THINK that it is saying " if you find commas, leave them alone, but I'm not certain.
,$keywords)); <<< then put the edited string of characters backing to the variable called $keywords.
What lead me to look at this was that I was inputting the following:
dance,diva which is what I THOUGHT the script was wanting from me based on the commented text in the README.txt file:
// Search terms. Comma separated keywords you can use to select products for your store
So..
Am I understanding what this line of code is supposed to do?
which, assuming I am correct, and I'm pretty sure that the first half is supposed to work as I've described, now brings me to my second question:
Why isn't the second bit working? Is there a typo?
To review:
dance diva produces results
dance,diva does not
Both, SHOULD work.
Thanks in advance for your help. I have a lot of HTML experience and computer experience but PHP is new to me.

$keywords = str_replace(" ",",",str_replace(",","",$keywords));
You can split into
$temp = str_replace(",","",$keywords);
$keywords = str_replace(" ",",",$temp);
First it replaces all comas with empty string, it is removes all comas. Then replaces all spaces with comas.
For "dance diva" there are no comas so first does nothing, then it replaces space and result is "dance,diva"
For "dance,diva" it removes coma, you get "dancediva" and there in no space to replace next so it is Your result.

apostrophe in preg_match_all() is giving me problems

So I've got this piece of code that wont play nice.
preg_match_all("/(\{\[)([\w-\d\s\.\|']*)(\]\})/i",$replace_text, $match);
What it is supposed to do, is allow an apostrophe to be in my replacement text. So in my text, where i have "{[SPIN--they are|they’re]}" it should return "they are" or "they're".
But instead, it simply does nothing and spits out the entire spintax code just as I typed above.
The only time this does not work, is when a replacement text has an apostrophe. It works perfectly everywhere else. Been trying to fix this for two days and I'm about to throw my keyboard through my monitor.
There are many things that my project does and it is imperative to have the {[SPIN-- before specifying the replacement text, and the ]} closing brackets.
Can someone help, please?

In your example string it's not a single quote character, but something that looks similarly
’ (the actual character) vs ' (that's what you think it is)

How do I strip out in PHP everything but printing characters?

I am working with this daily data feed. To my surprise, one the fields didn't look right after it was in MySQL. (I have no control over who provides the feed.)
So I did a mysqldump and discovered the zip code and the city for this record contained a non-printing char. It displayed it in 'vi' as this:
<200e>
I'm working in PHP and I parse this data and put it into the MySQL database. I have used the trim function on this, but that doesn't get rid of it. The problem is, if you do a query on a zipcode in the MySQL database, it doesn't find the record with the non-printing character.
I'd like the clean this up before it's put into the MySQL database.
What can I do in PHP? At first I thought regular expression to only allow a-z,A-Z, and 0-9, but that's not good for addresses. Addresses use periods, commas, hyphens and perhaps other things I'm not thinking of at the moment.
What's the best approach? I don't know what it's called to define it exactly other than printing characters should only be allowed. Is there another PHP function like trim that does this job? Or regular expression? If so, I'd like an example. Thanks!
I have looked into using the PHP function, and saw this posted at PHP.NET:
<?php
$a = "\tcafé\n";
//This will remove the tab and the line break
echo filter_var($a, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_LOW);
//This will remove the é.
echo filter_var($a, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH);
?>
While using FILTER_FLAG_STRIP_HIGH does indeed strip out the <200e> I mentioned seen in 'vi', I'm concerned that it would strip out the letter's accent in a name such as André.
Maybe a regular expression is the solution?

You can use PHP filters: http://www.php.net/manual/en/function.filter-var.php
I would recommend on using the FILTER_SANITIZE_STRING filter, or anything that fits what you need.

I think you could use this little regex replace:
preg_replace( '/[^[:print:]]+/', '', $your_value);
It basically strip out all non-printing characters from $your_value

I tried this:
<?php
$string = "\tabcde éç ÉäÄéöÖüÜß.,!-\n";
$string = preg_replace('/[^a-z0-9\!\.\, \-éâëïüÿçêîôûéäöüß]/iu', '', $string);
print "[$string]";
It gave:
[abcde éç ÉäÄéöÖüÜß.,!-]
Add all the special characters, you need into the regexp.

If you work in English and do not need to support unicode characters, then allow just [\x20-\x7E]
...and remove all others:
$s = preg_replace('/[^\x20-\x7E]+/', '', $s);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.