Remove hidden midpoint character from json string - php

Sending an API request I get a json string as answer which seems to include a hidden character, a midpoint [·]. In my ATOM editor the character is not visible but trying to remove the character after the midpoint results in no visible action, which indicates that it then removed the midpoint.
The consequence of the problem that transforming the json string to a PHP array results in array having value NULL.
Question:
What is the most straightforward way to remove the hidden character?
Should I search for the character and simply cut that character out of the string?
I understand that potentially the best would be to find the root-cause of why the midpoint got there, but I cannot find the root-cause.
Investigation and outcomes:
Comparing [$body1] and [body2] in https://www.diffchecker.com/, it shows:
[$body1] ·'{"columns":"test"}'
[$body1] '{"columns":"test"}'
This test shows that I do in fact have a hidden character.
It might not work in your environment to test since the hidden character probably is removed by copy/paste.
$body1 = '{"columns":"test"}'; // Hidden character.
$body2 = '{"columns":"test"}'; // Removed hidden character.
$body3 = '{"columns":"test"}'; // Same as body2.
var_dump(json_decode($body2, true));
if($body1 == $body2) {
echo 'Content the same';
} else
echo 'Content differs';
Result:
Content differs
Checking string length of the body strings.
echo strlen($body1) . "\n";
echo strlen($body2) . "\n";
echo strlen($body3) . "\n";
Result:
21
18
18

Related

(PHP/Wordpress) Remove characters easily, but not '$' when string is output of shortcode

NOTE: I've answered my own question below, but posting this anyway for someone else in my situation to learn...
This is driving me insane.
Here's the scenario. There are a million questions here about removing characters like '$' from simple strings (eg. $string = '$10.00') and I can do that just fine.
However... when the string originates from shortcode output (eg. $string = do_shortcode($mycode), I just can't do it. I can remove the 1's and 0's and .'s but the $ won't budge.
Note: The shortcode I'm using is 'Wordpress Currency Switcher' (wpcs_price). My newbie mind tells me that the problem is my string isn't a string at all, but live code or some weird type of array I don't understand.
Example.
function justwork() {
$rawprice = 10.00;
$rawpriceSC = '[wpcs_price value=\''.$rawprice.'\']';
$rawpriceOUT = do_shortcode($rawpriceSC); /* This will output $10.00 */
$finalprice = str_replace('$','',$rawpriceOUT);
echo $finalprice;
}
add_shortcode('justwork', 'justwork');
This will result in:
'$10.00' ($ not removed)
Using the same code on a different character (Eg. the '.') works just fine.
Eg.
$finalprice = str_replace('.','',$rawpriceOUT);
And the output will be:
'$1000'
The $ just won't budge. I've tried substr,trim,preg, lots of other stuff, and I soon realised all is not what it seems, it's carrying some baggage. So I tried capturing the output buffer, and still the resultant string behaved odd.
Any help... oh gosh please, going insane on this one.
ANSWER
While reviewing all the methods I'd looked at, one came up I hadn't.
echo htmlentities($finalprice);
And guess what this output?
<span class="wpcs_price" id="wpcs_58c296db36aa3" data-amount=10 ><span class="wpcs_price_symbol">$</span>10.00</span>
OK. So either I need to use & # 3 6 ; instead of $, or I can operate on that whole mess to get my 10.00. Geeze.
Confirmed that simply replacing the '$' with '& # 3 6 ;' (remove spaces) works fine.
function justwork() {
$rawprice = 10.00;
$rawpriceSC = '[wpcs_price value=\''.$rawprice.'\']';
$rawpriceOUT = do_shortcode($rawpriceSC);
$price = str_replace('$', '', $rawpriceOUT) ;
echo $price;
}
add_shortcode('justwork', 'justwork');

check if the string begin with euro/pound symbol

I'm trying to check if a string is start with '€' or '£' in PHP.
Below are the codes
$text = "€123";
if($text[0] == "€"){
echo "true";
}
else{
echo "false";
}
//output false
If only check a single char, it works fine
$symbol = "€";
if($symbol == "€"){
echo "true";
}
else{
echo "false";
}
// output true
I have also tried to print the string on browser.
$text = "€123";
echo $text; //display euro symbol correctly
echo $text[0] //get a question mark
I have tried to use substr(), but the same problem occurred.
Characters, such as '€' or '£' are multi-byte characters. There is an excellent article that you can read here. According to the PHP docs, PHP strings are byte arrays. As a result, accessing or modifying a string using array brackets is not multi-byte safe, and should only be done with strings that are in a single-byte encoding such as ISO-8859-1.
Also make sure your file is encoded with UTF-8: you can use a text editor such as NotePad++ to convert it.
If I reduce the PHP to this, it works, the key being to use mb_substr:
<?php
header ('Content-type: text/html; charset=utf-8');
$text = "€123";
echo mb_substr($text,0,1,'UTF-8');
?>
Finally, it would be a good idea to add the UTF-8 meta-tag in your head tag:
<meta charset="utf-8">
I suggest this as the easiest solution to you. Convert the symbols to their unicode identifiers using htmlentities().
htmlentities($text, ENT_QUOTES, "UTF-8");
Which will either give you £ or €. Now that allows you to run a switch() {case:} statement to check. (Or your if statements)
$symbols = explode(";", $text);
switch($symbols[0]) {
case "&pound":
echo "It's Pounds";
break;
case "&euro":
echo "It's Euros";
break;
}
Working Example
This happens because you’re using a multi-byte character encoding (probably UTF-8) in which both € and £ are recorded using multiple bytes. That means that "€" is a string of three bytes, not just one.
When you use $text[0] you're getting only the first byte of the first character, and so it doesn't match the three bytes of "€". You need to get the first three bytes instead, to check whether one string starts with another.
Here’s the function I use to do that:
function string_starts_with($string, $prefix) {
return substr($string, 0, strlen($prefix)) == $prefix;
}
The question mark appears because the first byte of "€" isn’t enough to encode a whole character: the error is indicated by ‘�’ when available, otherwise ‘?’.

PHP wont recognise double line feed

I am running a RST to php conversion and am using preg_match.
this is the rst i am trying to identify:
An example of the **Horizon Mapping** dialog box is shown below. A
summary of the main features is given below.
.. figure:: horizon_mapping_dialog_horizons_tab.png
**Horizon Mapping** dialog box, *Horizons* tab
Some of the input values to the **Horizon Mapping** job can be changed
during a Workflow using the internal programming language, IPL. For
details, refer to the *IPL User Guide*.
and I am using this regex:
$match = preg_match("/.. figure:: (.*?)(\n{2}[ ]{3}.*\n)/s", $text, &$result);
however it is returning as false.
here is a link of the expression working on regex
http://regex101.com/r/oB3fW7.
Are you sure that the line break is \n, is doubt, use \R:
$match = preg_match("/.. figure:: (.*?)(\R{2}[ ]{3}.*\R)/s", $text, &$result);
\R stands for either \n, \r and \r\n
My instinct would be to do some troubleshooting around the s flag as well as the $result variable passed by reference. To achieve the same without any interference from dots and the return variable, can you please try this regex:
..[ ]figure::[ ]([^\r\n]*)(?:\n|\r\n){2}[ ]{3}[^\r\n]*\R
In code, please try exactly like this:
$regex = "~..[ ]figure::[ ]([^\r\n]*)(?:\n|\r\n){2}[ ]{3}[^\r\n]*\R~";
if(preg_match($regex,$text,$m)) echo "Success! </br>";
Finally:
If this does not working, you might have a weird Unicode line break that php is not catching. To debug, for each character of your string, iterate through all the string's characters
Iterate: foreach(str_split($text) as $c) {
Print the character: echo $c . " value = "
Print the value from this function: . _uniord($c) . "<br />"; }

preg_match returns empty array

I always use preg_match and it always works fine,
but today I was trying to get a content between two html tags <code: 1>DATA</code>
And I have a problem, which my code explains:
function findThis($data){
preg_match_all("/\<code: (.*?)\>(.*?)\<\/code\>/i",$data,$conditions);
return $conditions;
}
// plain text
// working fine
$data1='Some text...Some.. Te<code: 1>This is a php code</code>';
//A text with a new lines
// Not working..
$data2='some text..
some.. te
<code: 1>
This is a php code
..
</code>
';
print_r(findThis($data1));
// OUTPUT
// [0][0] => <code: 1>This is a php code</code>
// [1][0] => 1
// [2][0] => This is a php code
print_r(findThis($data2));
//Outputs nothing!
This is because the . character in PHP is a wildcard for anything but newline. Examples including newlines would break. What you want to do is add the "s" flag to the end of your pattern, which modifies the . to match absolutely everything (including newlines).
/\<code: (.*?)\>(.*?)\<\/code\>/is
See here: http://www.php.net/manual/en/regexp.reference.internal-options.php

php preg_match_all html dates with slashes error

I've trying to preg_match_all a date with slashes in it sitting between 2 html tags; however its returning null.
here is the html:
> <td width='40%' align='right'class='SmallDimmedText'>Last Login: 11/14/2009</td>
Here is my preg_match_all() code
preg_match_all('/<td width=\'40%\' align=\'right\' class=\'SmallDimmedText\'>Last([a-zA-Z0-9\s\.\-\',]*)<\/td>/', $h, $table_content, PREG_PATTERN_ORDER);
where $h is the html above.
what am i doing wrong?
thanks in advance
It (from a quick glance) is because you are trying to match:
Last Login: 11/14/2009
With this regex:
Last([a-zA-Z0-9\s\.\-\',]*)
The regex doesn't contain the required characters of : and / which are included in the text string. Changing the required part of the regex to:
Last([a-zA-Z0-9\s\.\-\',:/]*)
Gives a match
Would it be better to simply use a DOM parser, and then preform the regex on the result of the DOM lookup? It makes for nicer regex...
EDIT
The other issue is that your HTML is:
...40%' align='right'class='SmallDimmedText'>...
Where there is no space between align='right' and class='SmallDimmedText'
However your regex for that section is:
...40%\' align=\'right\' class=\'SmallDimmedText\'>...
Where it is indicated there is a space.
Use a DOM Parser It will save you more headaches caused by subtle bugs than you can count.
Just to give you an idea on how simple it is to parse using Simple HTML DOM.
$html = str_get_html(...);
$elems = $html->find('.SmallDimmedText');
if ( count($elems->children()) != 1 ){
throw new Exception('Too many/few elements found');
}
$text = $elems->children(0)->plaintext;
//parsing here is only an example, but you have removed all
//the html so that any regex used is really simple.
$date = substr($text, strlen('Last Login: '));
$unixTime = strtotime($date);
I see at least two problems :
in your HTML string, there is no space between 'right' and class=, and there is one space there in your regex
you must add at least these 3 characters to the list of matched characters, between the [] :
':' (there is one between "Login" and the date),
' ' (there are spaces between "Last" and "Login", and between ":" and the date),
and '/' (between the date parts)
With this code, it seems to work better :
$h = "<td width='40%' align='right'class='SmallDimmedText'>Last Login: 11/14/2009</td>";
if (preg_match_all("#<td width='40%' align='right'class='SmallDimmedText'>Last([a-zA-Z0-9\s\.\-',: /]*)<\/td>#",
$h, $table_content, PREG_PATTERN_ORDER)) {
var_dump($table_content);
}
I get this output :
array
0 =>
array
0 => string '<td width='40%' align='right'class='SmallDimmedText'>Last Login: 11/14/2009</td>' (length=80)
1 =>
array
0 => string ' Login: 11/14/2009' (length=18)
Note I have also used :
# as a regex delimiter, to avoid having to escape slashes
" as a string delimiter, to avoid having to escape single quotes
My first suggestion would be to minimize the amount of text you have in the preg_match_all, why not just do between a ">" and a "<"? Second, I'd end up writing the regex like this, not sure if it helps:
/>.*[0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}</
That will look for the end of one tag, then any character, then a date, then the beginning of another tag.
I agree with Yacoby.
At the very least, remove all reference to any of the HTML specific and simply make the regex
preg_match_all('#Last Login: ([\d+/?]+)#', ...

Categories