Finding and replacing special characters in php - php

Encoding makes this a tough thing to explain. I'm getting a string from an XML file using PHP. When I echo it I see a small black circle: • or • . Oh, stackoverflow renders these, sorry. I meant to say it's the ascii character "bull" or "#8226"
echo $str;
gets me:
[CIRCLE] wordswords [CIRCLE] more words [CIRCLE] still more words
How can I find this character using PHP? I want to explode on it. I can't search for a circle, and searching for 8226 or circ doesn't work. Do I have to use urlencode?
$str=url_encode($str);
$str=str_replace(%E2%80%A2,'-CIRCLE-',$str);
$str=url_decode($str);
$str=explode('-CIRCLE-');
Or is there a more efficient way?

Check out this thread: Bullet "•" in XML. I think it will help your to find an answer.

Related

PHP: How would I remove parts of a string between 2 chunks of characters without removing too much?

This problem is driving me nuts. Let's say I have a string:
This is a &start;pretty bad&end; string that I want to &start;somehow&end; display differently
I want to be able to remove the &start; and &end; parts as well as everything in between so it says:
This is a string that I want to display differently
I tried using preg_replace with a regular expression but it took off too much, ie:
This is a display differently
The question is: how do I remove the stuff just between sets of &start; and &end; pairs and make sure that it doesn't remove anything between any &end; and &start; segments?
Keep in mind, I'm working with hundreds of strings that are very different to each other so I'm looking for a flexible solution that'll work with all of them.
Thanks in advance for any help with this.
Edit: Replaced dollar signs with ampersands. Oops!
Try this regex /\&start;(.+?)\$end;/g
It looks like it works as desired: https://regex101.com/r/MW5nom/2
I quickly tried it on chrome console using JS, tried converting it into PHP:
"This is a &start;pretty bad$end; string that I want to &start;somehow$end; display differently".replace(/\&start;(.+?)\$end;/g, "")

Add a FNC1 character to code created with tcpdf datamatrix

I am using tcpdf to generate datamatrix barcodes. Works really nice. Now I was asked if we could add fnc1 characters to our code.
But I have no clue what the representation of the fnc1 character would be correct for the tcpdf generator.
I came across this here http://sourceforge.net/p/tcpdf/discussion/435311/thread/161b1b1a
But I would like to understand where the answer of using chr(241) actually comes from. To me it seems like it fell from the sky. Documentation doesn't say anything about it and I have not found anywhere else that chr(241) would be a representation of the fnc1 character.
Apart from that, it doesn't work for me, scanning the barcode just results in ñ characters in the middle of the code.
Anyone an idea how I could get the fnc1 character into my tcpdf datamatrix? What am I missing? Thanks for help in advance!
$string = chr(241).str_replace(";", chr(241), $string);
$barcodeobj = new TCPDF2DBarcode();
$barcodeobj->setBarcode($string, 'DATAMATRIX');
$barcodeobj->getBarcodeSVGcode(6, 6, 'black');
Looking at the code for version 1.0.008 (from 2014-05-06) in /tcpdf/include/barcodes/datamatrix.php I cannot see any comprehensive treatment of the special function or macro characters in Data Matrix so you are probably out of luck.
That said, the forum reply to which you link was written by the author of the TCPDF (Nicola Asuni) so it might we worth reaching out to him to see what he was thinking at the time. My guess would be that an example input used by some other library had mislead him into believing that FNC1 can be represented as an ordinary code point, however this is wrong since FNC1 is a non-data character that requires special treatment.

PHP function to convert special characters to unicode(UTF-16)

Is there a PHP function that can take a string and convert any special characters to unicode. Similar to htmlspecialchars() or UTF8_encode().
For example in the string: "I think Bob's going too".
I would need the apostrophe or single right quote unicode in place of the apostrophe in "Bob's". So then after conversion the string should read: "I think Bob\u2019s going too".
I need this for use in a PHP script that prints into a javascript function.
Using \ to escape or ' does not work, it stops the script from running. I am trying to use Flowplayers Playist plugin. The only way it seems I can have a string with special characters is if they are in unicode.
Here is a JSFIDDLE to play around with and see what I mean when I say it doesn't work. Just replace \u2019 with ' or something similar and click to have the song play. The media player just goes black and doesn't play anything, whereas if you leave it with \u2019 then it plays fine.
Any help is appreciated.
I think json_encode() is the function you are looking for here.
The following code:
$string = "I think Bob’s going too";
print_r(json_encode($string));
will output:
"I think Bob\u2019s going too"

Complex PHP/Perl regular expression for emoticons

I've checked google for help on this subject but all the answers keep overlooking a fatal flaw in the replacement method.
Essentially I have a set of emoticons such as :) LocK :eek and so on and need to replace them with image tags. The problem I'm having is identifying that a particular emoticon is not part of a word and is alone on a line. For example on our site we allow 'quick links' which are not included in the smiley replacement which take the format go:forum, user:Username and so on. Pretty much all answers I've read don't allow for this possiblity and as such break these links (i.e. go<img src="image.gif" />orum). I've tried experimenting around with different ways to get around this to check for the start of the line, spaces/newline characters and so on but I've not had much luck.
Any help with this problem would be greatly appreciated. Oh also I'm using PHP 5 and the preg_% functions.
Thanks,
Rupert S.
Edit 18/04/2011:
Thanks for your help peeps :) Have created the final regex that I though I'd share with everyone, had a couple problems to do with special space chars including newline but it's now working like a dream the final regex is:
(?<=\s|\A|\n|\r|\t|\v|\<br \/\>|\<br\>)(:S)(?=\s|\Z|$|\n|\r|\t|\v|\<br \/\>|\<br\>)
To complete the comment into an answer: The simplest workaround would be to assert that the emoticons are always surrounded by whitespace.
(?<=\s|^)[<:-}]+(?=\s|$)
The \s covers normal spaces and line breaks. Just to be safe ^ and $ cover occurrences at the start or very end of the text subject. The assertions themselves do not match, so can be ignored in the replacement string/callback.
If you want to do all the replace in one single preg_replace, try this:
preg_replace('/(?<=^|\s)(:\)|:eek)(?=$|\s)/e'
,"'$1'==':)'?'<img src=\"smile.gif\"/>':('$1'==':eek'?'<img src=\"eek.gif\"/>':'$1')"
,$input);

How to match anything except a pattern between two tags

I am attempting to match a string which is composed of HTML. Basically it is an image gallery so there is a lot of similarity in the string. There are a lot of <dl> tags in the string, but I am looking to match the last <dl>(.?)+</dl> combo that comes before a </div>.
The way I've devised to do this is to make sure that there aren't any <dl's inside the <dl></dl> combo I'm matching. I don't care what else is there, including other tags and line breaks.
I decided I had to do it with regular expressions because I can't predict how long this substring will be or anything that's inside it.
Here is my current regex that only returns me an array with two NULL indicies:
preg_match_all('/<dl((?!<dl).)+<\/dl>(?=<\/div>)/', $foo, $bar)
As you can see I use negative lookahead to try and see if there is another <dl> within this one. I've also tried negative lookbehind here with the same results. I've also tried using +? instead of just + to no avail. Keep in mind that there's no pattern <dl><dl></dl> or anything, but that my regex is either matching the first <dl> and the last </dl> or nothing at all.
Now I realize . won't match line breaks but I've tried anything I could imagine there and it still either provides me with the NULL indicies or nearly the whole string (from the very first occurance of <dl to </dl></div>, which includes several other occurances of <dl>, exactly what I didn't want). I honestly don't know what I'm doing incorrectly.
Thanks for your help! I've spent over an hour just trying to straighten out this one problem and it's about driven me to pulling my hair out.
Don't use regular expressions for irregular languages like HTML. Use a parser instead. It will save you a lot of time and pain.
I would suggest to use tidy instead. You can easily extra all the desired tags with their contents, even for broken HTML.
In general I would not recommend to write a parser using regex.
See http://www.php.net/tidy
As crazy as it is, about 2 minutes after I posted this question, I found a way that worked.
preg_match_all('/<dl([^\z](?!<dl))+?<\/dl>(?=<\/div>)/', $foo, $bar);
The [^\z] craziness is just a way I used to say "match all characters, even line breaks"

Categories