No semicolon in encoding - php

im trying to decode text which is presented in WINDOWS-1251 i believe.
The string looks like this:
&#1040&#1075&#1077&#1085&#1090
Which should represent Agent in Russian. And here is the problem:
I'm not able to convert this string unless i add semicolons after each number
I cant do it manually, because i have like 10000 lines of text to be converted.
So the question is, what is this encoding (without semicolons) and how can i add them automatically to each line (regex maybe?) without breaking the code.
So far, i've been trying to do this by using this code:
App Logic
public function parseSentence((array) $sentences, $sentence, $i) {
if (strstr($sentence, '-')) {
$sentences[$i] = $this->explodeAndSplit('-', $sentence);
} else if (strstr($sentence, "'")) {
$sentences[$i] = $this->explodeAndSplit("'", $sentence);
} else if (strstr($sentence, "(")) {
$sentences[$i] = $this->explodeAndSplit("(", $sentence);
} else if (strstr($sentence, ")")) {
$sentences[$i] = $this->explodeAndSplit(")", $sentence);
} else {
if (strstr($sentence, '#')) {
$sentences[$i] = chunk_split($sentence, 6, ';');
}
return $sentences;
}
/**
* Explode and Split
* #param string $explodeBy
* #param string $string
*
* #return string
*/
private function explodeAndSplit($explodeBy, $string) {
$exp = explode($explodeBy, $string);
for ($j = 0; $j < count($exp); $j++) {
$exp[$j] = chunk_split($exp[$j], 6, ';');
}
return implode($explodeBy, $exp);
}
But obviously, this approach is a bit incorrect (well, totally incorrect), because i'm not taking into account many other 'special' characters. So how can it be fixed?
Update:
I'm using Lumen for backend and AngularJS for frontend. Getting all the data parsed in Lumen (database/text files/etc), providing so called API routes for AngularJS to access and retrieve data. And the thing is, this semicolonless encoding work great in any browser if accessed directly, but fails to be displayed in Angular due to missing semicolons

These are Russian HTML Codes (Cyrillic). To ensure they are displayed properly, you'll need an appropriate content-type applied:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
Now to do this correctly, you'll want to preg_split() the above string of HTML codes you have, accordingly:
array_filter(preg_split("/[&#]+/", $str));
The array_filter() simply removes any empty values. You could ultamitely use explode() too, to do the same thing.
This will return an array of the numbers you have. From there, a simple implode() with the required prepended &# and appended ; is simple:
echo '&#' .implode( ";&#", array_filter(preg_split("/[&#]+/", $str) )) . ';';
Which returns:
Агент
Now when generated as correct HTML, it displays the following Russian text:
Агент
Which translates directly to Agent in Russian.

Related

very large php string magically turns into array

I am getting an "Array to string conversion error on PHP";
I am using the "variable" (that should be a string) as the third parameter to str_replace. So in summary (very simplified version of whats going on):
$str = "very long string";
str_replace("tag", $some_other_array, $str);
$str is throwing the error, and I have been trying to fix it all day, the thing I have tried is:
if(is_array($str)) die("its somehow an array");
serialize($str); //inserted this before str_replace call.
I have spent all day on it, and no its not something stupid like variables around the wrong way - it is something bizarre. I have even dumped it to a file and its a string.
My hypothesis:
The string is too long and php can't deal with it, turns into an array.
The $str value in this case is nested and called recursively, the general flow could be explained like this:
--code
//pass by reference
function the_function ($something, &$OFFENDING_VAR, $something_else) {
while(preg_match($something, $OFFENDING_VAR)) {
$OFFENDING_VAR = str_replace($x, y, $OFFENDING_VAR); // this is the error
}
}
So it may be something strange due to str_replace, but that would mean that at some point str_replace would have to return an array.
Please help me work this out, its very confusing and I have wasted a day on it.
---- ORIGINAL FUNCTION CODE -----
//This function gets called with multiple different "Target Variables" Target is the subject
//line, from and body of the email filled with << tags >> so the str_replace function knows
//where to replace them
function perform_replacements($replacements, &$target, $clean = TRUE,
$start_tag = '<<', $end_tag = '>>', $max_substitutions = 5) {
# Construct separate tag and replacement value arrays for use in the substitution loop.
$tags = array();
$replacement_values = array();
foreach ($replacements as $tag_text => $replacement_value) {
$tags[] = $start_tag . $tag_text . $end_tag;
$replacement_values[] = $replacement_value;
}
# TODO: this badly needs refactoring
# TODO: auto upgrade <<foo>> to <<foo_html>> if foo_html exists and acting on html template
# Construct a regular expression for use in scanning for tags.
$tag_match = '/' . preg_quote($start_tag) . '\w+' . preg_quote($end_tag) . '/';
# Perform the substitution until all valid tags are replaced, or the maximum substitutions
# limit is reached.
$substitution_count = 0;
while (preg_match ($tag_match, $target) && ($substitution_count++ < $max_substitutions)) {
$target = serialize($target);
$temp = str_replace($tags,
$replacement_values,
$target); //This is the line that is failing.
unset($target);
$target = $temp;
}
if ($clean) {
# Clean up any unused search values.
$target = preg_replace($tag_match, '', $target);
}
}
How do you know $str is the problem and not $some_other_array?
From the manual:
If search and replace are arrays, then str_replace() takes a value
from each array and uses them to search and replace on subject. If
replace has fewer values than search, then an empty string is used for
the rest of replacement values. If search is an array and replace is a
string, then this replacement string is used for every value of
search. The converse would not make sense, though.
The second parameter can only be an array if the first one is as well.

Character replacement encoding php

I have a string that I want to replace all 'a' characters to the greek 'α' character. I don't want to convert the html elements inside the string ie text.
The function:
function grstrletter($string){
$skip = false;
$str_length = strlen($string);
for ($i=0; $i < $str_length; $i++){
if($string[$i] == '<'){
$skip = true;
}
if($string[$i] == '>'){
$skip = false;
}
if ($string[$i]=='a' && !$skip){
$string[$i] = 'α';
}
}
return $string;
}
Another function I have made works perfectly but it doesn't take in account the hmtl elements.
function grstrletter_no_html($string){
return strtr($string, array('a' => 'α'));
}
I also tried a lot of encoding functions that php offers with no luck.
When I echo the greek letter the browser output it without a problem. When I return the string the browser outputs the classic strange question mark inside a triangle whenever the replace was occured.
My header has <meta http-equiv="content-type" content="text/html; charset=UTF-8"> and I also tried it with php header('Content-Type: text/html; charset=utf-8'); but again with no luck.
The string comes from a database in UTF-8 and the site is in wordpress so I just use the wordpress functions to get the content I want. I don't think is a db problem because when I use my function grstrletter_no_html() everything works fine.
The problem seems to happen when I iterate the string character by character.
The file is saved as UTF-8 without BOM (notepad++). I tried also to change the encoding of the file with no luck again.
I also tried to replace the greek letter with the corresponding html entity α and α but again same results.
I haven't tried yet any regex.
I would appreciate any help and thanks in advance.
Tried: Greek characters encoding works in HTML but not in PHP
EDIT
The solution based on deceze brilliant answer:
function grstrletter($string){
$skip = false;
$str_length = strlen($string);
for ($i=0; $i < $str_length; $i++){
if($string[$i] == '<'){
$skip = true;
}
if($string[$i] == '>'){
$skip = false;
}
if ($string[$i]=='a' && !$skip){
$part1 = substr($string, 0, $i);
$part1 = $part1 . 'α';
$string = $part1 . substr($string, $i+1);
}
}
return $string;
}
The problem is that you're setting only a single byte of your string. Example:
$str = "\x00\x00\x00";
var_dump(bin2hex($str));
$str[1] = "\xff\xff";
var_dump(bin2hex($str));
Output:
string(6) "000000"
string(6) "00ff00"
You're setting a two-byte character, but only one byte of it is actually pushed into the string. The second result here would have to be 00ffff for your code to work.
What you need is to cut the string from 0 to $i - 1, concatenate the 'α' into it, then concatenate the rest of the string $i + 1 to end onto it if you want to insert a multibyte character. That, or work with characters instead of bytes using the mbstring functions.
For more background information, see What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.

Random String Generator (PHP)

I am trying write a PHP function that returns a random string of a given length. I wrote this:
<?
function generate_string($lenght) {
$ret = "";
for ($i = 0; $i < $lenght; $i++) {
$ret .= chr(mt_rand(32,126));
}
return $ret;
}
echo generate_string(150);
?>
The above function generates a random string, but the length of the string is not constant, ie: one time it is 30 characters, the other is 60 (obviously I call it with the same length as input every time). I've searched other examples of random string generators, but they all use a base string to pick letters. I am wondering why this method is not working properly.
Thanks!
Educated guess: you attempt to display your plain text string as HTML. The browser, after being told it's HTML, handles it as such. As soon as a < character is generated, the following characters are rendered as an (unknown) HTML tag and are not displayed as HTML standards mandate.
Fix:
echo htmlspecialchars(generate_string(150));
This is the conclusion i reached after testing it a while : Your functions works correctly. It depends on what you do with the randomly generated string. If you are simply echo-ing it, then it might generate somthing like <ck1ask which will be treated like a tag. Try eliminating certain characters from being concatenated to the string.
This function will work to generate a random string in PHP
function getRandomString($maxlength=12, $isSpecialChar=false)
{
$randomString=null;
//initalise the string include lower case, upper case and numbers
$charSet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
//if required special character to include, please set $isSpecialchar= 1 or true
if ($isSpecialChar) $charSet .= "~##$%^*()_±={}|][";
//loop for get specify length character with random characters
for ($i=0; $i<$maxlength; $i++) $randomString .= $charSet[(mt_rand(0, (strlen($charSet)-1)))];
//return the random string
return $randomString;
}
//call the function set value you required to string length default:12
$random8char=getRandomString(8);
echo $random8char;
Source: Generate random string in php

Xor encryption in PHP

I'm new to Xor encryption, and I'm having some trouble with the following code:
function xor_this($string) {
// Let's define our key here
$key = ('magic_key');
// Our plaintext/ciphertext
$text =$string;
// Our output text
$outText = '';
// Iterate through each character
for($i=0;$i<strlen($text);)
{
for($j=0;$j<strlen($key);$j++,$i++)
{
$outText .= $text{$i} ^ $key{$j};
//echo 'i='.$i.', '.'j='.$j.', '.$outText{$i}.'<br />'; //for debugging
}
}
return $outText;
}
When I run this it works for normal strings, like 'dog' but it only partially works for strings containing numbers, like '12345'.
To demonstrate...
xor_this('dog') = 'UYV'
xor_this('123') = ''
It's also interesting to note that xor_this( xor_this('123') ) = '123', as I expect it to. I'm pretty sure the problem resides somewhere in my shaky understanding of bitwise operators, OR possibly the way PHP handles strings that contain numbers. I'm betting there's someone clever out there that knows exactly what's wrong here. Thanks.
EDIT #1: It's not truly 'encryption'. I guess obfuscation is the correct term, which is what I'm doing. I need to pass a code containing unimportant data from a user without them being able to easily tamper with it. They're completing a timed activity off-line and submitting their time to an online scoreboard via this code. The off-line activity will obfuscate their time (in milliseconds). I need to write a script to receive this code and turn it back into the string containing their time.
How i did it, might help someone ...
$msg = 'say hi!';
$key = 'whatever_123';
// print, and make unprintable chars available for a link or alike.
// using $_GET, php will urldecode it, if it was passed urlencoded
print "obfuscated, ready for url: " . urlencode(obfuscate($msg, $key)) . "\n";
print "deObfuscated: " . obfuscate(obfuscate($msg, $key), $key);
function obfuscate($msg, $key) {
if (empty($key)) return $msg;
return $msg ^ str_pad('', strlen($msg), $key);
}
I think you might have a few problems here, I've tried to outline how I think you can fix it:
You need to use ord(..) to get the ASCII value of a character so that you can represent it in binary. For example, try the following:
printf("%08b ", ord('A')); // outputs "01000001"
I'm not sure how you do an XOR cipher with a multi-byte key, as the wikipedia page on XOR cipher doesn't specify. But I assume for a given key like "123", your key starts "left-aligned" and extends to the length of the text, like this:
function xor_this($text) {
$key = '123';
$i = 0;
$encrypted = '';
foreach (str_split($text) as $char) {
$encrypted .= chr(ord($char) ^ ord($key{$i++ % strlen($key)}));
}
return $encrypted;
}
print xor_this('hello'); // outputs "YW_]]"
Which encrypts 'hello' width the key '12312'.
There's no guarantee that the result of the XOR operation will produce a printable character. If you give us a better idea of the reason you're doing this, we can probably point you to something sensible to do instead.
I believe you are faced with console output and encoding problem rather than XOR-related.
Try to output results of xor function in a text file and see a set of generated characters. I believe HEX editor would be the best choice to observe and compare a generated characters set.
Basically to revert text back (even numbers are in) you can use the same function:
var $textToObfuscate = "Some Text 12345";
var $obfuscatedText = $xor_this($textToObfuscate);
var $restoredText = $xor_this($obfuscatedText);
Based on the fact that you're getting xor_this( xor_this('123') ) = '123', I am willing to guess that this is merely an output issue. You're sending data to the browser, the browser is recognizing it as something which should be rendered in HTML (say, the first half dozen ASCII characters). Try looking at the page source to see what is really there. Better yet, iterate through the output and echo the ord of the value at each position.
Use this code, it works perfect
function scramble($inv) {
$key=342244; // scramble key
$invarr=str_split($inv);
for($index=0;$index<=strlen($inv)-1;$index++) {
srand($key);
$var=rand(0,255);
$res=$res.(chr(ord($var)) ^ chr(ord($invarr[$index])));
$key++;
}
return($res);
}
Try this:
$outText .= (string)$text{$i} ^ (string)$key{$j};
If one of the two operands is an integer, PHP casts the other to an integer and XORs them for a numeric result.
Alternatively, you could use this:
$outText .= chr(ord($text{$i}) ^ ord($key{$j}));
// Iterate through each character
for($i=0; $i<strlen($text); $i++)
{
$outText .= chr(ord($text{$i}) ^ ord($key{$i % strlen($key)))};
}
note: it probably will create some weird characters...
Despite all the wise suggestions, I solved this problem in a much simpler way:
I changed the key! It turns out that by changing the key to something more like this:
$key = 'ISINUS0478331006';
...it will generate an obfuscated output of printable characters.

PHP: comparing URIs which differ in percent-encoding

In PHP, I want to compare two relative URLs for equality. The catch: URLs may differ in percent-encoding, e.g.
/dir/file+file vs. /dir/file%20file
/dir/file(file) vs. /dir/file%28file%29
/dir/file%5bfile vs. /dir/file%5Bfile
According to RFC 3986, servers should treat these URIs identically. But if I use == to compare, I'll end up with a mismatch.
So I'm looking for a PHP function which will accepts two strings and returns TRUE if they represent the same URI (dicounting encoded/decoded variants of the same char, upper-case/lower-case hex digits in encoded chars, and + vs. %20 for spaces), and FALSE if they're different.
I know in advance that only ASCII chars are in these strings-- no unicode.
function uriMatches($uri1, $uri2)
{
return urldecode($uri1) == urldecode($uri2);
}
echo uriMatches('/dir/file+file', '/dir/file%20file'); // TRUE
echo uriMatches('/dir/file(file)', '/dir/file%28file%29'); // TRUE
echo uriMatches('/dir/file%5bfile', '/dir/file%5Bfile'); // TRUE
urldecode
EDIT: Please look at #webbiedave's response. His is much better (I wasn't even aware that there was a function in PHP to do that.. learn something new everyday)
You will have to parse the strings to look for something matching %## to find the occurences of those percent encoding. Then taking the number from those, you should be able to pass it so the chr() function to get the character of those percent encodings. Rebuild the strings and then you should be able to match them.
Not sure that's the most efficient method, but considering URLs are not usually that long, it shouldn't be too much of a performance hit.
I know this problem here seems to be solved by webbiedave, but I had my own problems with it.
First problem: Encoded characters are case-insensitive. So %C3 and %c3 are both the exact same character, although they are different as a URI. So both URIs point to the same location.
Second problem: folder%20(2) and folder%20%282%29 are both validly urlencoded URIs, which point to the same location, although they are different URIs.
Third problem: If I get rid of the url encoded characters I have two locations having the same URI like bla%2Fblubb and bla/blubb.
So what to do then? In order to compare two URIs, I need to normalize both of them in a way that I split them in all components, urldecode all paths and query-parts for once, rawurlencode them and glue them back together and then I could compare them.
And this could be the function to normalize it:
function normalizeURI($uri) {
$components = parse_url($uri);
$normalized = "";
if ($components['scheme']) {
$normalized .= $components['scheme'] . ":";
}
if ($components['host']) {
$normalized .= "//";
if ($components['user']) { //this should never happen in URIs, but still probably it's anything can happen thursday
$normalized .= rawurlencode(urldecode($components['user']));
if ($components['pass']) {
$normalized .= ":".rawurlencode(urldecode($components['pass']));
}
$normalized .= "#";
}
$normalized .= $components['host'];
if ($components['port']) {
$normalized .= ":".$components['port'];
}
}
if ($components['path']) {
if ($normalized) {
$normalized .= "/";
}
$path = explode("/", $components['path']);
$path = array_map("urldecode", $path);
$path = array_map("rawurlencode", $path);
$normalized .= implode("/", $path);
}
if ($components['query']) {
$query = explode("&", $components['query']);
foreach ($query as $i => $c) {
$c = explode("=", $c);
$c = array_map("urldecode", $c);
$c = array_map("rawurlencode", $c);
$c = implode("=", $c);
$query[$i] = $c;
}
$normalized .= "?".implode("&", $query);
}
return $normalized;
}
Now you can alter webbiedave's function to this:
function uriMatches($uri1, $uri2) {
return normalizeURI($uri1) === normalizeURI($uri2);
}
That should do. And yes, it is quite more complicated than even I wanted it to be.

Categories