PHP - Find number between 2 Unicode characters - php

Simple problem but i sux at regular expressions so i need here ur help.
What do i need to type to find a number between two first signs: •
Find out its codes but it doenst help me much: http://www.fileformat.info/info/unicode/char/2022/index.htm
Do you know what should i type in for example preg_match function to make it work?
Example:
• 12345 • TESTTESTTEST
Example Output:
12345
Thanks in advance!

To match a specific Unicode code point, use \x{FFFF} where FFFF is the hexadecimal number of the code point you want to match. You can omit leading zeros in the hexadecimal number between the curly braces. Since \x by itself is not a valid regex token, \x{1234} can never be confused to match \x 1234 times. It always matches the Unicode code point U+1234. \x{1234}{5678} will try to match code point U+1234 exactly 5678 times.
Anyway, what you're probably looking for is something like this:
\x{2022} (\d*) \x{2022}
As for the (\d*) part, it basically means match any digit infinite times, and assign this bit of the pattern as a match (braces stand for capture groups)

Actually i found out a way to do it a bit easier.
I used preg_match() with $pattern = "/[0-9]{1,}/";
Huh xD

Related

PHP Regex IF THEN pattern

I'm new to writing Regex patterns and I'm struggling to understand why the following line doesn't work.
/^(£)?[0-9]+(?(?=\.[0-9]{2}){0,1}(p)?|$)/
Note: I'm writing this in PHP
I want the code to find £3.10p, but not £3p. Essentially, the letter 'p' can't be allowed unless it is preceded with a decimal point and 2 digits.
EDIT: To clarify, the letter p can be used at the end of the string, however if the string contains a £ and/or a decimal point, the p must be preceded by the point and 2 digits.
More examples of valid inputs:
£3.50
350
£350
234p
Invalid input:
£2p
Could someone please fix this and explain where I've gone wrong here?
Thanks
If 0.50p is allowed, then you can do it like this:
^((£?[0-9]+)(?!p)|([0-9]+p?))?(?<!p)(\.[0-9]{2})?p?$
Regex saved with all your examples here: https://regex101.com/r/rE1bT9/3
Try this:
/^(?(?=£)(£\d+\.\d{2}p?|£\d+)|\d+p?)$/
You can test it here:
https://regex101.com/r/mG8kR0/1
It is unclear how your valid sample "234p" matches your rule "p is allowed if there are at least two digits and a point". However, in your question you are using positive lookahead, this seems an overhead here.
Your rule for p may be written as: (\.[0-9]{2}p?)
So over all, you just need: /^(£)?[0-9]+(\.[0-9]{2}p?)$/
And if you allow "234p" also, just make the period optional: /^(£)?[0-9]+(\.?[0-9]{2}p?)$/
Try it out here: http://www.regexr.com/
The latter regex gives positive feedback to all your valid samples, and it denies the invalid input. It is unclear what should happen if there are only two digits, and if it is important to catch some pieces, there should be more brackets.
How about:
/^(?:£?[0-9]+(?:\.[0-9]{2})?|[0-9]+p?)$)/

Quick PHP regex for digit format

I just spent hours figuring out how to write a regular expression in PHP that I need to only allow the following format of a string to pass:
(any digit)_(any digit)
which would look like:
219211_2
so far I tried a lot of combinations, I think this one was the closest to the solution:
/(\\d+)(_)(\\d+)/
also if there was a way to limit the range of the last number (the one after the underline) to a certain amount of digits (ex. maximal 12 digits), that would be nice.
I am still learning regular expressions, so any help is greatly appreciated, thanks.
The following:
\d+_\d{1,12}(?!\d)
Will match "anywhere in the string". If you need to have it either "at the start", "at the end" or "this is the whole thing", then you will want to modify it with anchors
^\d+_\d{1,12}(?!d) - must be at the start
\d+_\d{1,12}$ - must be at the end
^\d+_\d{1,12}$ - must be the entire string
demo: http://regex101.com/r/jG0eZ7
Explanation:
\d+ - at least one digit
_ - literal underscore
\d{1,12} - between 1 and 12 digits
(?!\d) - followed by "something that is not a digit" (negative lookahead)
The last thing is important otherwise it will match the first 12 and ignore the 13th. If your number happens to be at the end of the string and you used the form I originally had [^\d] it would fail to match in that specific case.
Thanks to #sln for pointing that out.
You don't need double escaping \\d in PHP.
Use this regex:
"/^(\d+)_(\d{1,12})$/"
\d{1,12} will match 1 to 12 digist
Better to use line start/end anchors to avoid matching unexpected input
Try this:
$regex= '~^/(\d+)_(\d+)$~';
$input= '219211_2';
if (preg_match($regex, $input, $result)) {
print_r($result);
}
Just try with following regex:
^(\d+)_(\d{1,12})$

Regex Capital letter combo

REGEX is something of a mystery to me. After searching on SO, I did download Espresso and went through the tutorial, but things still are not clicking for me. It may just be my specific need, but I haven't found any examples. What I want to do is find matches that are exactly two specific capital (or lowercase, mix) and then a string of numbers. Here are the cases I want to test against:
TL123
TL 123
tl123
tl 123
TLABC123
tlabc123
What I'm then trying to do is preg_replace the results for that match (and ultimately always return TL-123 - for example).
So, any letter or number combo after TL would return TL- and vice-versa. Any nudges in the right direction would be extremely helpful. Thanks!
Edit
It might actually be preg_match_all that I need for this.
To match the specified pattern, you can use:
TL(?:[^0-9]*)(\d+)
This will match a TL followed by anything that isn't a number (or nothing) and then a list of numbers.
You could use this with PHP's preg_replace() like:
$str = preg_replace('/TL(?:[^0-9]*)(\d+)/i', 'TL-$1', $str);
This example, of course, assumes that TL is the exact characters you want to match. If TL is just a placeholder and you could match anything, you could use the following:
preg_replace('/([a-z]{2})(?:[^0-9]*)(\d+)/i', '$1-$2', $str);
With this, I have it hardcoded to only allow 2 characters to match ({2}). You can modify this to any number if you need it to change.
Also, as you want the matched characters to always be uppercase, but can match lowercase, I would suggest to just use strtoupper() around the result (instead of a callback).

PHP regular expressions (phonenumber)

I'm having some trouble with a regular expression for phone numbers. I am trying to create a regex that is as broad as possible for european phone numbers. The phone number can start with a + or with two leading 0's, followed by a number in between 0 and 40. this is not necessary however, so this first part can also ignored. After that, it should all be numbers, grouped into pairs of at least two, with a whitespace or a - inbetween the groups.
The regex I have put together can be found below.
/((\+|00)+[0-4]+[0-9]+)?([ -]?[0-9]{2,15}){1,5}/
This should match the following structures
0031 34-56-78
0032123456789
0033 123 456 789
0034-123-456-789
+35 34-56-78
+36123456789
+37 123 456 789
+38-123-456-789
...
What it also matches according to my javascript
+32 a54b 67-0:
So I must have made a mistake somewhere, but I really can't see it. Any help would be appreciated.
The problem is that you don't use anchors ^ $ to define the start and ending of the string and will therefore find a match anywhere in the string.
/^((\+|00)+[0-4]+[0-9]+)?([ -]?[0-9]{2,15}){1,5}$/
Adding anchors will do the trick. More about these meta characters can be found here.
Try this, may be can help you.
if (ereg("^((\([0-9]{3}\) ?)|([0-9]{3}-))?[0-9]{3}-[0-9]{4}$",$var))
{
$valid = true;
}
Put ^ in the beginning of the RegExp and $ in the end.

Matching Roman Numbers

I have regular expression
(IX|IV|V?I{0,3}|M{1,4}|CM|CD|D?C{1,3}|XC|XL|L?X{1,3})
I use it to detect if there is any roman number in text.
eregi("( IX|IV|V?I{0,3}[\.]| M{1,4}[\.]| CM|CD|D?C{1,3}[\.]| XC|XL|L?X{1,3}[\.])", $title, $regs)
But format of roman number is always like this: " IV."... I have added in eregi example white space before number and "." after number but I still get the same result. If text is something like "somethinvianyyhing" the result will be vi (between both)...
What am I doing wrong?
You have no space before VI the space belongs always to the alternative before it was written and not to all. The same for the \. it belongs always to the alternative where it was written.
Try this
" (IX|IV|V?I{0,3}|M{1,4}|CM|CD|D?C{1,3}|XC|XL|L?X{1,3})\."
See it here on Regexr
This will match
I.
II.
III.
IV.
V.
VI.
VII.
VIII.
IX.
X.
But not
XI.
MMI.
MMXI.
somethinvianyyhing
Your approach to match roman numbers is far from being correct, an approach to match the roman numbers more correct is this, for numbers till 50 (L)
^(?:XL|L|L?(?:IX|X{1,3}|X{0,3}(?:IX|IV|V|V?I{1,3})))$
See it here on Regexr
I tested this only on the surface, but you see this will really get complex and in this expression C, D and M are still missing.
Not to speak about special cases for example 4 = IV = IIII and there are more of them.
Wikipedia about Roman numbers

Categories