How to safely encode PHP string into alphanumeric only? - php

How to safely encode PHP string into alphanumeric only string?
E.g. "Hey123 & 5" could become "ed9e0333" or may be something better looking
It's not about stripping characters, its about encoding.
The goal is to make any string after this encoding suitable for css id string (alnum),
but later I will need to decode it back and get the original string.

bin2hex seems to fit the bill (although not as compact as some other encodings). Also take care that CSS ids cannot start with a number, so to be sure you'll need to prefix something to the bin2hex result before you have your final ID.
For the reverse (decoding), there's no such thing as hex2bin, but someone on the PHP documentation site suggested this (untested):
$bin_str = pack("H*" , $hex_str);

You can use BASE64 encoding
http://php.net/manual/function.base64-encode.php

This thread is dead for long time, but I was looking for solution to this problem and found this thread, someone might find the easy answer useful.
My solution is:
str_replace('=', '_', base64_encode($data));

Related

How to get only integer value Started from Symbol from given string

I need a integer value which started from £ and £ , I try to do with regrex but I only getting value which starting from £.
Here I use the regrex Like this.
if(preg_match('/(\£[0-9]+(\.[0-9]{2})?)/',$vals,$matches))
{
$main[]= str_replace('£','',$matches[0]);
}
I am not familiar with regrex. so please share any solution. any help would highly appriciated.Thank you.
From your question I understand that you are having troubles with character encodings, so first of all I would suggest you to address this issue one step before, it is really important to resolve encoding issues in the earliest possible step.
Back to the question, first off, to avoid falling deeper into the charset encoding hell, I would recommend you to write your regexp literal in HEX, because otherwise the charset encoding in which you save your PHP files would affect the result. I.E. if you do something like this:
preg_match('/(£|£)(\d+)', ...)
It would match "£" and "£" (binary) if you save your source code in ISO-8859-1, but it would actually match "£" and "£" (binary) if you chose to save your source code in UTF-8 (which might be a good idea in general). So be careful with this, and verify what your editor/IDE is doing!
My suggestion thus is to write it this way, which is equivalent for ISO-8859-1 and UTF-8:
preg_match('/(\xa3|\xc2\xa3)(\d+)', ...) // match "£" and "£"
Also I suggest to make use of the sub-pattern capture feature of regular expressions, so you don't have to str_replace() afterwards, this way:
if (preg_match('/(?:\xa3|\xc2\xa3)([0-9]+(?:\.[0-9]{2})?)/', $data, $regp)) {
$main[] = $regp[1];
}
The "?:" at after the "(" means "this is a sub-pattern, but don't capture it".
Note that you can also replace preg_match with preg_match_all and you will find in $regp[1] the array of all matching numbers already prepared.
Try with this modified regex:
(?:£|£)([0-9]+(\.[0-9]{2})?)
It should do the trick. But it will return you decimal values also, because of the:
(.[0-9]{2})?
You can remove it and it will return only the integer part after £|£

Strip Base64 strings from long text

I really wonder if I'm really the first one asking this question or am I so blind to finde some about this...
I have a longer text and I want to strip base64 encoded strings of it
I am a text and have some lines with some content
There are more than one line but sometimes I have
aSBhbSBhIG5vcm1hbCB0ZXh0IHRoYXQgd2FzIGNvZ
GVkIGluIGJhc2UgNjQgYW5kIG5vdyBpIHdhcyB0cmFu
c2xhdGVkIGJhY2sgdG8gYmxhbmsgdGV4dGZvcm1hd
C4gaSB0aGFuayB5b3UgZm9yIHBheWluZyBhdHRlbnRp
b24uIGJ5ZQ==
and this is what I want to strip / extract by using php
As you can see there is base64 encoded data in the text and I want to extract/strip these lines.
I allready tried a lot of regex samples from SO something like
$regex = '#^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$#m';
preg_match($regex, $content, $output_array );
but this not solved anything...
What I need is a regex that only selects the base strings...
Is this even possible ? I mean is base64 selectable by regex ? I guess :)
EDIT: String-Source is the content of an email
EDIT2: Guess the best syntax for this case your be so track strings that has more than one uppdercased character and can have numbers and has no whitespaces. But regex is not my daily bread :D
First of all: You can not reliably do this!
Why?
Simple, the point why base64 is so great in some cases is, that is encodes all the data with "standard" characters. Those that are used in normal texts, sentences, and yes, even words.
Background
Is "Hello" a base64-encoded string? Well, yes, in the meaning of it is "valid base64 encoded". It probably returns a lot of jibberish, but it is a base64-ok string.
Therefore, you can only decide on a length after which you consider characters connected without any space to be base64 encoded. Of course in languages such as german you may have quite some trouble here, as there a compound nouns, such as "Bäckerfachverkäuferinnenhosenherstellungsautomatenzuliefererdienst" or such (just made that up).
Workaround
So on the length you have to decide yourself, an then you can go with this:
[a-zA-Z0-9\+\/\=]{20,}
Also see the example here: https://regex101.com/r/uK5gM1/1
I considered "20" to be the minimum length for "base64 encoded stuff" here, but as said, it is up to you. Also, as a small side note, the = is not really encoded content but fill bytes, but I still added it to the regex.
Edit: Gnah.. you can even see in my example that I did not catch the last line :) When changing the number to 12 it works fine here, but there may be words with more than 12 characters ... so - as said, not really reliably possible in this manner.
For the snippet in the example /^\w{53}$/gm does the job. If you can rely on length of course.
EDIT:
Considering circumstances and updates, I would go with /\n([\w=\n]{50,})\n/gs but without metadata it may be tricky to guess mime-type of the decoded stuff, and almost impossible to restore filenames etc.

Percent-encoding entire PHP string into numbered ASCII values for obfuscation

I'd like to somehow obscure the contents of $url = "http://blah.somedomain.com/contents/somefolder/somefile.htm"; so I can use them for links but so that the URLs are not easily read by humans when looking at the page source. The obfuscated URL still needs to work in a browser when clicking on it though so other methods of obfuscation that I've looked at are no good.
What we're after is e.g. $obscureurl = "%3A%2F%2F"...etc
Any ideas? Thanks.
Edit: Thanks for suggestions so far, but to clarify, I should have said that I'm not after encoding into HTML entities (the # values), I'm after Percent-encoding (hex values in ASCII).
For example, to change hello#me.com into: %68%65%6c%6c%6f%40%6d%65%2e%63%6f%6d
ASCII table is here for the hex of each letter and symbol: http://ascii.cl/
Is this kind of complete conversion possible with PHP? Thanks
$url = '..';
$encoded = join(array_map(function ($byte) { return "%$byte"; }, str_split(bin2hex($url), 2)));
That's essentially the entire encoding mechanism. Take the raw bytes in hex (bin2hex), 2 characters per byte, and prepend a %.
Not that this will really do a whole lot for obfuscation. The browser may indeed not even display it in its encoded form, and even search engines may display only the decoded form. Further, you're still producing a canonical URL. It doesn't matter what exactly that URL contains; if people have a link to it, they have a link to it, regardless of how human readable that link may or may not be.
I can see 2 easy ways to achieve this:
Replace every character of your link by its html entity (see How to convert all characters to their html entity equivalent using PHP)
Use some kind of ids and save the matching url in your DB: (something like http://example.com/redirect/412)

Changing case with regex

I was looking for this for a while, but was not able to find any answer. I need to change a string to lowercase in PHP.
Off course, this can be done by using strtolower(), but I was wondering if its possible to do it via preg_replace().
I noticed that in vim one can use \L or \U modifiers in the back references to change the case to lower or upper.
Is something like that possible to do in PHP, i.e. in the second argument in preg_replace()? The reason why I wanna change the case via preg_replace() is that I heard that it might work better for UTF8 strings (not sure if its true).
Thanks.
You should actually just use
mb_strtolower($str, 'UTF-8')
That way you specify utf-8 is the encoding, and all should work well.
Edit: sorry had strtoupper, changed to lower. Also, you can leave off utf-8 and it should automatically detect the encoding and use the right one.
Doing with preg_replace is practically impossible.
This is because you need to pass the strtolower() / strtoupper() as a parameter to preg_replace function. Since preg_replace cannot act on their own.
Go with the function what Dave suggested.

MCRYPT mode is generating slashes

I'm new to this encryption thing, so i'm not realy sure how to format my question.
Anyways i'm using framework called kohana and for encryption it uses three things:
key, cipher, mode so my problem is that when it encodes some string sometimes i get / in encryption like this fclzSev6DVfOk2Z/BSSi4dRYFn4t and i don't want that so my guess is that i should change mode which right now is MCRYPT_MODE_NOFB so if i'm right what mode do i have to use?
As Francis Avila notes, the encrypted output seems to be Base64-encoded, and so may contain slashes and plus signs (and possibly equals signs at the end) in addition to letters and numbers.
You can safely replace those signs with something else, as long as you remember to change them back before decoding. The PHP strtr() function is handy for this. For example, here's how to convert a string from normal Base64 to the RFC 4648 URL-safe Base64 variant and back:
$url_safe_base64 = strtr( $base64_string, "+/", "-_" );
$base64_string = strtr( $url_safe_base64, "-_", "+/" );
mode has absolutely nothing to do with whether the generated output has slashes, but specifies what mode of encryption mcrypt should use. If you don't know what it's for use the default.
The reason there are slashes is that Kohana's encode() method will encode the binary output from the encryption in base64, which may contain slashes.
You can str_replace() the slashes with something else, but this will probably create more problems and headaches than it solves.

Categories