I need a integer value which started from £ and £ , I try to do with regrex but I only getting value which starting from £.
Here I use the regrex Like this.
if(preg_match('/(\£[0-9]+(\.[0-9]{2})?)/',$vals,$matches))
{
$main[]= str_replace('£','',$matches[0]);
}
I am not familiar with regrex. so please share any solution. any help would highly appriciated.Thank you.
From your question I understand that you are having troubles with character encodings, so first of all I would suggest you to address this issue one step before, it is really important to resolve encoding issues in the earliest possible step.
Back to the question, first off, to avoid falling deeper into the charset encoding hell, I would recommend you to write your regexp literal in HEX, because otherwise the charset encoding in which you save your PHP files would affect the result. I.E. if you do something like this:
preg_match('/(£|£)(\d+)', ...)
It would match "£" and "£" (binary) if you save your source code in ISO-8859-1, but it would actually match "£" and "£" (binary) if you chose to save your source code in UTF-8 (which might be a good idea in general). So be careful with this, and verify what your editor/IDE is doing!
My suggestion thus is to write it this way, which is equivalent for ISO-8859-1 and UTF-8:
preg_match('/(\xa3|\xc2\xa3)(\d+)', ...) // match "£" and "£"
Also I suggest to make use of the sub-pattern capture feature of regular expressions, so you don't have to str_replace() afterwards, this way:
if (preg_match('/(?:\xa3|\xc2\xa3)([0-9]+(?:\.[0-9]{2})?)/', $data, $regp)) {
$main[] = $regp[1];
}
The "?:" at after the "(" means "this is a sub-pattern, but don't capture it".
Note that you can also replace preg_match with preg_match_all and you will find in $regp[1] the array of all matching numbers already prepared.
Try with this modified regex:
(?:£|£)([0-9]+(\.[0-9]{2})?)
It should do the trick. But it will return you decimal values also, because of the:
(.[0-9]{2})?
You can remove it and it will return only the integer part after £|£
Related
I am trying to extract n characters from a string using
substr($originalText,0,250);
The nth character is an en-dash. So I get the last character as †when I view it in notepad. In my editor, Brackets, I can't even open the log file it since it only supports UTF-8 encoding.
I also cannot run json_encode on this string.
However, when I use substr($originalText,0,251), it works just fine. I can open the log file and it shows an en-dash instead of â€. json_encode also works fine.
I can use mb_convert_encoding($mystring, "UTF-8", "Windows-1252") to circumvent the problem, but could anyone tell me why having these characters at the end specifically causes an error?
Moreover, on doing this, my log file shows †in brackets, which is confusing too.
My question is why is having the en-dash at the end of the string, different from having it anywhere else (followed by other characters).
Hopefully my question is clear, if not I can try to explain further.
Thanks.
Pid's answer gives an explanation for why this is happening, this answer just looks at what you can do about it...
Use mb_substr()
The multibyte string module was designed for exactly this situation, and provides a number of string functions that handle multibyte characters correctly. I suggest having a look through there as there are likely other ones that you will need in other places of your application.
You may need to install or enable this module if you get a function not found error. Instructions for this are platform dependent and out-of-scope for this question.
The function you want for the case in your question is called mb_substr() and is called the same as you would use substr(), but has other optional arguments.
UTF-8 uses so-called surrogates which extend the codepage beyond ASCII to accomodate many more characters.
A single UTF-8 character may be coded into one, two, three or four bytes, depending on the character.
You cut the string right in the middle of a multi-byte character:
[<-character->]
[byte-0|byte-1]
^
You cut the string right here in the middle!
[<-----character---->]
[byte-0|byte-1|byte-2]
^ ^
Or anywhere here if it's 3 bytes long.
So the decoder has the first byte(s) but can't read the entire character because the string ends prematurely.
This causes all the effects you are witnessing.
The solution to this problem is here in Dezza's answer.
I really wonder if I'm really the first one asking this question or am I so blind to finde some about this...
I have a longer text and I want to strip base64 encoded strings of it
I am a text and have some lines with some content
There are more than one line but sometimes I have
aSBhbSBhIG5vcm1hbCB0ZXh0IHRoYXQgd2FzIGNvZ
GVkIGluIGJhc2UgNjQgYW5kIG5vdyBpIHdhcyB0cmFu
c2xhdGVkIGJhY2sgdG8gYmxhbmsgdGV4dGZvcm1hd
C4gaSB0aGFuayB5b3UgZm9yIHBheWluZyBhdHRlbnRp
b24uIGJ5ZQ==
and this is what I want to strip / extract by using php
As you can see there is base64 encoded data in the text and I want to extract/strip these lines.
I allready tried a lot of regex samples from SO something like
$regex = '#^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$#m';
preg_match($regex, $content, $output_array );
but this not solved anything...
What I need is a regex that only selects the base strings...
Is this even possible ? I mean is base64 selectable by regex ? I guess :)
EDIT: String-Source is the content of an email
EDIT2: Guess the best syntax for this case your be so track strings that has more than one uppdercased character and can have numbers and has no whitespaces. But regex is not my daily bread :D
First of all: You can not reliably do this!
Why?
Simple, the point why base64 is so great in some cases is, that is encodes all the data with "standard" characters. Those that are used in normal texts, sentences, and yes, even words.
Background
Is "Hello" a base64-encoded string? Well, yes, in the meaning of it is "valid base64 encoded". It probably returns a lot of jibberish, but it is a base64-ok string.
Therefore, you can only decide on a length after which you consider characters connected without any space to be base64 encoded. Of course in languages such as german you may have quite some trouble here, as there a compound nouns, such as "Bäckerfachverkäuferinnenhosenherstellungsautomatenzuliefererdienst" or such (just made that up).
Workaround
So on the length you have to decide yourself, an then you can go with this:
[a-zA-Z0-9\+\/\=]{20,}
Also see the example here: https://regex101.com/r/uK5gM1/1
I considered "20" to be the minimum length for "base64 encoded stuff" here, but as said, it is up to you. Also, as a small side note, the = is not really encoded content but fill bytes, but I still added it to the regex.
Edit: Gnah.. you can even see in my example that I did not catch the last line :) When changing the number to 12 it works fine here, but there may be words with more than 12 characters ... so - as said, not really reliably possible in this manner.
For the snippet in the example /^\w{53}$/gm does the job. If you can rely on length of course.
EDIT:
Considering circumstances and updates, I would go with /\n([\w=\n]{50,})\n/gs but without metadata it may be tricky to guess mime-type of the decoded stuff, and almost impossible to restore filenames etc.
I am trying to validate a string against the following regular expression which has been imposed upon me:
[-,.:; 0-9A-Z&#$£¥€'"«»‘’“”?!/\\()\[\]{}<>]{3}[-,.:; 0-9A-Z&#$£¥€'"«»‘’“”?!/\\()\[\]{}<>*=#%+]{0,157}
Can anybody help with writing a preg_match in PHP to validate an input string against this? I am struggling because:
my knowledge of regex isn't that great in the first place
I see special characters in the regex itself which I feel sure PHP won't be happy about me inserting directly into a string (e.g. $£¥€)
In vain hope I just tried sticking it into preg_match, escaping the double quotes, thus:
$ste = "Some string input";
if(preg_match("/[-,.:; 0-9A-Z&#$£¥€'\"«»‘’“”?!/\\()\[\]{}<>]{3}[-,.:; 0-9A-Z&#$£¥€'\"«»‘’“”?!/\\()\[\]{}<>*=#%+]{0,157}/",$ste))
{
echo "OK";
}
else
{
echo "Not OK";
}
Thanks in advance!!
PHP will be perfectly happy with the "special" characters in the expression, provided you do the following:
Make sure the input string is encoded with UTF-8 encoding.
Make sure your PHP program file is saved using UFT-8 encoding. (and obviously you'll need to use UTF-8 encoding in all other parts of your system too, or you'll get borked characters showing up somewhere along the line, but that's outside the scope of this question)
Add the add the u modifier to the end of the regex pattern string to tell the regex parser to handle UTF-8 characters. ie:
preg_match("/....../u", ...);
^
add this
Other than that, you've got it pretty much spot on already.
You can do that:
if (preg_match('~^[ -"$&-),-<>?-\]{}£¥€«»‘’“”]{3}[ -\]{}£¥€«»‘’“”]{0,157}$~u', $ste))
echo 'OK';
else
echo 'Not OK';
I have added the "u" modifier for unicode, and reduced the size of the character classes using ranges (example:,-< means all characters between , and < in the unicode table).
But the most important, I have added anchors ^ and $ that means respectivly start and end of the string.
I have this code from a javascript
/+\uFF0B0-9\uFF10-\uFF19\u0660-\u0669\u06F0-\u06F9u/
after some read about php & \u support I convert it to \x
/\+\x{FF0B}0-9\x{FF10}-\x{FF19}\x{0660}-\x{0669}\x{06F0}-\x{06F9}/u
but still I'm not able to use it in php
$phoneNumber = '+911561110304';
$start = preg_match('/\+\x{FF0B}0-9\x{FF10}-\x{FF19}\x{0660}-\x{0669}\x{06F0}-\x{06F9}/u', $phoneNumber,$matches);
the matches will be null!
how to fix this?
It looks like you want to match an ASCII plus sign or its Japanese Halfwidth equivalent, followed by one or more digits from a few different writing systems. But, as #mario observed, you seem to be missing some square brackets. The JavaScript version probably should be:
/[+\uFF0B][0-9\uFF10-\uFF19\u0660-\u0669\u06F0-\u06F9]+/
(I couldn't see any reason for the u at the end, so I dropped it.) The PHP version would be:
'/[+\x{FF0B}][0-9\x{FF10}-\x{FF19}\x{0660}-\x{0669}\x{06F0}-\x{06F9}]+/u'
Of course, this will allow a mix of ASCII, Arabic and Halfwidth characters in the same number. If that's a problem, you'll need to break it up a bit. For example:
'/\+(?:[0-9]+|[\x{0660}-\x{0669}]+|[\x{06F0}-\x{06F9}]+)|\x{FF0B}[\x{FF10}-\x{FF19}]+/u'
How to safely encode PHP string into alphanumeric only string?
E.g. "Hey123 & 5" could become "ed9e0333" or may be something better looking
It's not about stripping characters, its about encoding.
The goal is to make any string after this encoding suitable for css id string (alnum),
but later I will need to decode it back and get the original string.
bin2hex seems to fit the bill (although not as compact as some other encodings). Also take care that CSS ids cannot start with a number, so to be sure you'll need to prefix something to the bin2hex result before you have your final ID.
For the reverse (decoding), there's no such thing as hex2bin, but someone on the PHP documentation site suggested this (untested):
$bin_str = pack("H*" , $hex_str);
You can use BASE64 encoding
http://php.net/manual/function.base64-encode.php
This thread is dead for long time, but I was looking for solution to this problem and found this thread, someone might find the easy answer useful.
My solution is:
str_replace('=', '_', base64_encode($data));