str_replace not working on strange character - php

I get this string "Holder – 2pcs" from my Wordpress post title using the get_the_title() function then I use str_replace to replace the "–" character but no luck!
str_replace("–","-","Holder – 2pcs");
any help appreciated!
Edit:
(Response to comments)
I had to save the text from $title1=get_the_title(); to .txt file and I noticed that the – saved as – in the txt file ... then I replaced str_replace("–","-","Holder – 2pcs") and it works! the problem is that in my wordpress databse the title contains - char as it should but then when I use get_the_title(); function of wordpress in my code to retrieve the title I get the - char as – which is eventually – I dont know why get_the_title(); causing this issue!
Any thoughts?

Your issue is caused by your "-" character being something else that looks the same.
Step 1:
Ensure that everything is using the same Character set, from your MySQL to your PHP to your input text.
$title1 = iconv(mb_detect_encoding(get_the_title(), mb_detect_order(), true), "UTF-8", get_the_title());
(reference)
Step 2:
Ensure that what you convert is the raw string and not an HTML encoded output
$title2 = html_entity_decode($title1, ENT_NOQUOTES | ENT_HTML5, "UTF-8");
Step 3:
Run the str_replace() function as originally attempted. If there are a range of possible "dash" characters then you can build an array:
$dashes = ['–','–','—','-'];
$title3 = str_replace($dashes,"-",$title2);
(reference)

The code you've shared does work:
var_dump(str_replace("–","-","Holder – 2pcs"));
string(13) "Holder - 2pcs"
If it doesn't, they you're actually running something different. Most likely, your input data contains white space or HTML entities and you're looking at it through browser glasses.
Trying further inspecting your input data with e.g.:
header('Content-Type', 'text/plain');
var_dump("Holder – 2pcs", bin2hex("Holder – 2pcs"));
string(15) "Holder – 2pcs"
string(30) "486f6c64657220e280932032706373"

Related

&center causes ¢ sign to come in PHP

I want to save an url with parameters to a string.
One of the parameters is "center".
However, when I try to save it as www.xyz.com?r=1&center=34, it reads it as www.xyz.com?r=1¢er=34. I do not want to convert &cent to the cent sign. What is the proper procedure to do this?
Edit: Since this is receiving negative votes, I'd like to mention that in my case, I actually needed the raw string instead of escaping it. It is working now. Rendering on the HTML page was a problem, but the file_get_contents needed the exact url, and it is working.
If you still wish to downvote this question, please explain why.
Try this way,
$link = "www.xyz.com?r=1&center=34";
$result = str_replace("&", "&", $link);
echo $result;
It's because &something; is the HTML way of "escaping". Here are a few examples:
©: ©
×: ×
€: €
←: ← (left arrow)
→: → (right arrow)
&: & (the actual sign)
You can find more of those on w3schools. 😉
It's because you have to escape the & in URLs: &.
So, in your URL, you should replace that lonely & by &.It should look like this:
www.xyz.com?r=1&center=43

Removing ' from string

Currently I have the following strings.
$artist = 'Lookas';
$song = 'Can't Get Enough';
As you can see above, the $song portion contains random text appose to just placing a symbol which should look like this ', how can I solve this?
The title also returns this some times as well.. & appose to returning the proper & symbol.
Those are not "random" characters. It is a html number encoded apostrophe.
<?php
$song = 'Can't Get Enough';
var_dump(mb_convert_encoding($song, 'UTF-8', 'HTML-ENTITIES'));
The output obviously is:
string(16) "Can't Get Enough"
As mentioned in the comment by #Memor-X, those are HTML-entities.
If this is data that you're getting from your own database, then you're using htmlspecialchars() in the wrong place. If not, then I recommend reading up on what that function does, and you'll find out how to decode those entities.

Removing the "\ufeff" from the end of object -> content in Google+ API json result

The result from the Google+ API has \ufeff appended to the end of every "content" result (I don't really know why?)
What is the best way to remove this unicode character from the json result? It is producing a '?' in some of the output I am displaying.
Example:
https://developers.google.com/+/api/latest/activities/get#try-it
enter activity id
z12pvrsoaxqlw5imi22sdd35jwvkglj5204
and click Execute, result will be:
{
.....
"object": {
......
"content": "CONTENT OF GOOGLE PLUS POST HERE \ufeff",
......
example PHP code which shows a '?' where the '\ufeff' is:
<?php
$data = json_decode($result_from_google_plus_api, true);
echo $data['object']['content'];
// outputs "CONTENT OF GOOGLE PLUS POST HERE ?"
echo trim($data['object']['content']);
// outputs "CONTENT OF GOOGLE PLUS POST HERE ?"
Or am I going about this the wrong way? Should I be fixing the '?' issue rather than trying to remove the '\ufeff'?
In your case, you could use this regexp:
$str = preg_replace('/\x{feff}$/u', '', $str);
That way you can exactly match that code point value and have it removed.
From my experience there are a lot more white-spacey-character you want to remove. From my experienced this works well for me:
# I like to call this unicodeTrim()
$str = preg_replace(
'/
^
[\pZ\p{Cc}\x{feff}]+
|
[\pZ\p{Cc}\x{feff}]+$
/ux',
'',
$str
);
I found http://www.regular-expressions.info/unicode.html a pretty good resource about the fine details:
\pZ - match any kind of whitespace or invisible separator
\p{Cc} - match control characters
\x{feff} - match BOM
I've seen regex suggest to match \pC instead of \pCc, however this is dangerous because pC includes any code point to which no character has been assigned. I've had actual data (certain emojis or other stuff) being removed because of this.
But, YMMW, I cant' stress this.
By Respect to All Answers
I test most of answers but finally find solution here: GitHub
$field = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $field);

PHP htmlentities not working even with parameters

Of course this has been asked before and have searched for solutions, all which have not worked thus far. I want to change out the TM symbol and the ampersand to their html equivelents by using htmlentities or htmlspecialchars:
$TEST = "Kold Locker™ & other stuff";
echo "ORGINIAL: " . $TEST . "<BR/>";
echo "HTML: " . htmlentities($TEST, ENT_COMPAT, 'UTF-8');
This displays:
ORGINIAL: Kold Locker™ & other stuff
HTML:
I have also tried it with htmlspecialchars and the second parameter changed with the same result.
What am I missing that others have claimed worked in other solutions?
UPDATE: I tried just displaying utf8_encode($TEST) and it displayed HTML: Kold Locker™ & other stuff
I dont know why , this worked for me (htmlentities has to be called twice for me)
$html="<html> <head><head>something like this </html>"
$entities_correction= htmlentities( $html, ENT_COMPAT, 'UTF-8');
echo htmlentities( $entities_correction, ENT_COMPAT, 'UTF-8');
output :
<html> <head><head>something like this </html>
I thought I had the same problem as Pjack (msg of Jul 14 at 8:54):
$str = "A 'quote' is <b>bold</b>";
echo htmlentities($str);
gives in the Browser (Firefox in my case) the original string $str (without any translation), while
echo htmlentities(htmlentities($str));
gives:
A 'quote' is <b>bold</b>
(I use PHP/5.4.16 obtained from windows-7 XAMPP).
However, after some more thought it occurred to me that the Browser shows the strings < and > as > and <.
(See the source code in the browser). Second call of htmlentities translates & into & and only then the Browser shows what you expected in the first place.
Your code works for me :-?
In the manual page for htmlentities() we can read:
Return Values
Returns the encoded string.
If the input string contains an invalid code unit sequence within the
given encoding an empty string will be returned, unless either the
ENT_IGNORE or ENT_SUBSTITUTE flags are set.
My guess is that the input data is not properly encoded as UTF-8 and the function is returning an empty string. (Assuming that the script is not crashing, i.e., code after that part still runs.)
I had almost the same problem (in which somehow it showed the same text every time) and with a combination of different echo´s i got it. It seems that webbrowsers like firefox show the same text every time. That´s because when you echo the htmlentities-text, its being converted back into normal text while echoing. When I echo a script with the variable/text to be console.logged, it actually echo´s the htmlentities text (almost) correctly. Instead of replacing every special char with html-codings, it replaces ´em with some other coding i already saw before (I can´t remember the name). Htmlentities-ing it again, I get the same text echo´d again (remember it converts everything), but echoing it in console.log-version gives to me the expected result. Now, again, as a result:
1. Execute htmlentities two times!
2. Don´t (at least with firefox) echo the htmlentities as normal into the webpage. If you´d like to check if the value is actually correct, echo a script that logs it into console.
I hope this could help other guys with the same problem,
VicStudio
EDIT: 3. If you are using a $_POST formular, don´t forget to add accept-charset="UTF-8" (or some other charset) to the <form> tag.
EVEN MORE EDIT: Only do 2 times htmlentities if you wish to echo your result normal into the page. If you wish to directly send in f.e. a database, only do it once! -> what i said before is partually wrong. :(
This is an old post, but for anyone still looking for a solution, here is what I use with success:
echo html_entity_decode($htmlString);

How to Convert Arabic Characters to Unicode Using PHP

I want to to know how can I convert a word into unicode exactly like:
http://www.arabunic.free.fr/
can anyone know how to do that using PHP considering that Arabic text may contains ligatures?
thanks
Edit
I'm not sure what is that "unicode" but I need to have the Arabic Character in it's equivalent machine number considering that arabic characters have different contextual forms depending on their position - see here:
http://en.wikipedia.org/wiki/Arabic_alphabet#Table_of_basic_letters
the same character in different position:
ب‎ | ـب‎ | ـبـ‎ | بـ‎
I think it must be a way to convert each Arabic character into it's equivalent number, but how?
Edit
I still believe there's a way to convert each character to it's form depending on positions
any idea is appreciated..
All what you need is function called: utf8Glyphs which you can find it in ArGlyphs.class.php download it from ar-php
and visit Ar-PHP for the ArPHP more information about the project and classes.
This will reverse the word with same of its characters (glyphs).
Example of usage:
<?php
include('Arabic.php');
$Arabic = new Arabic('ArGlyphs');
$text = 'بسم الله الرحمن الرحيم';
$text = $Arabic->utf8Glyphs($text);
echo $text;
?>
i assume you wnat to convert بهروز to \u0628\u0647\u0631\u0648\u0632 take a look at http://hsivonen.iki.fi/php-utf8/ all you have to do after calling unicodeToUtf8('بهروز') is to convert integers you got in array to hex & make sure they have 4digigts & prefix em with \u & you're done. also you can get same using json_encode
json_encode('بهروز') // returns "\u0628\u0647\u0631\u0648\u0632"
EDIT:
seems you want to get character codes of بب which first one differs from second one, all you have to do is applying bidi algorithm on your text using fribidi_log2vis then getting character code by one of ways i said before.
here's example:
$string = 'بب'; // \u0628\u0628
$bidiString = fribidi_log2vis($string, FRIBIDI_LTR, FRIBIDI_CHARSET_UTF8);
json_encode($bidiString); // \ufe90\ufe91
EDIT:
i just remembered that tcpdf has bidi algorithm which implemented using pure php so if you can not get fribidi extension of php to work, you can use tcpdf (utf8Bidi by default is protected so you need to make it public)
require_once('utf8.inc'); // http://hsivonen.iki.fi/php-utf8/
require_once('tcpdf.php'); // http://www.tcpdf.org/
$t = new TCPDF();
$text = 'بب';
$t->utf8Bidi(utf8ToUnicode($text)); // will return an array like array(0 => 65168, 1 => 65169)
Just set the element containing the arabic text to "rtl" (right to left), then input correctly spelled arabic and the text will flow with all ligatures looked for.
div {
direction:rtl;
}
On a side note, don't forget to read "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"
Think about that : The "ba" (ب) arabic letter is a "ba" no matter where it appears in the sentence.
Try this:
<?php
$string = 'a';
$expanded = iconv('UTF-8', 'UTF-32', $string);
$arr = unpack('L*', $expanded);
print_r($arr);
?>
I'm totally agree with FloatBird about the use of the arabic.php which you will find it as he said at ar-php, The thing is they have changed the class name after version 4 from Arabic to I18N_Arabic so in order for the code to work using arabic.php ver 4.0 you need to change the code to
<?php
include('Arabic.php');
$Arabic = new I18N_Arabic('ArGlyphs');
$text = 'بسم الله الرحمن الرحيم';
$text = $Arabic->utf8Glyphs($text);
echo $text;
?>
Also notice that you need to put the php code file inside the I18N folder.
Anyway it is working fantastically, Thanks again FloatBird
I had a similar problem when I wanted to store an object that had values in Arabic, so writing in Arabic was stored as UNICODE," so the solution was as follows.
$detailsLog = $product->only(['name', 'unit', 'quantity']);
$detailsLog = json_encode($detailsLog, JSON_UNESCAPED_UNICODE);
$log->details = $detailsLog;
$log->save();
When you put the second parameter of the json_encode JSON_UNESCAPED_UNICODE follower, the Arabic words return without encoding.
i think you could try:
<meta charset="utf-8" />
if this does not work use FloatBird Answer

Categories