Matching Roman Numbers - php

I have regular expression
(IX|IV|V?I{0,3}|M{1,4}|CM|CD|D?C{1,3}|XC|XL|L?X{1,3})
I use it to detect if there is any roman number in text.
eregi("( IX|IV|V?I{0,3}[\.]| M{1,4}[\.]| CM|CD|D?C{1,3}[\.]| XC|XL|L?X{1,3}[\.])", $title, $regs)
But format of roman number is always like this: " IV."... I have added in eregi example white space before number and "." after number but I still get the same result. If text is something like "somethinvianyyhing" the result will be vi (between both)...
What am I doing wrong?

You have no space before VI the space belongs always to the alternative before it was written and not to all. The same for the \. it belongs always to the alternative where it was written.
Try this
" (IX|IV|V?I{0,3}|M{1,4}|CM|CD|D?C{1,3}|XC|XL|L?X{1,3})\."
See it here on Regexr
This will match
I.
II.
III.
IV.
V.
VI.
VII.
VIII.
IX.
X.
But not
XI.
MMI.
MMXI.
somethinvianyyhing
Your approach to match roman numbers is far from being correct, an approach to match the roman numbers more correct is this, for numbers till 50 (L)
^(?:XL|L|L?(?:IX|X{1,3}|X{0,3}(?:IX|IV|V|V?I{1,3})))$
See it here on Regexr
I tested this only on the surface, but you see this will really get complex and in this expression C, D and M are still missing.
Not to speak about special cases for example 4 = IV = IIII and there are more of them.
Wikipedia about Roman numbers

Related

Stripping down Phonenumber (mobile)

Is there a function or a easy way to strip down phone numbers to a specific format?
Input can be a number (mobile, different country codes)
maybe
+4917112345678
+49171/12345678
0049171 12345678
or maybe from another country
004312345678
+44...
Im doing a
$mobile_new = preg_replace("/[^0-9]/","",$mobile);
to kill everything else than a number, because i need it in the format 49171 (without + or 00 at the beginning), but i need to handle if a 00 is inserted first or maybe someone uses +49(0)171 or or inputs a 0171 (needs to be 49171.
so the first numbers ALWAYS need to be countryside without +/00 and without any (0) between.
can someone give me an advice on how to solve this?
You can use
(?:^(?:00|\+|\+\d{2}))|\/|\s|\(\d\)
to match most of your cases and simply replace them with nothing. For example:
$mobile = "+4917112345678";
$mobile_new = preg_replace("/(?:^(?:00|\+|\+\d{2}))|\/|\s|\(\d\)/","",$mobile);
echo $mobile_new;
//output: 4917112345678
regex101 Demo
Explanation:
I'm making use of OR here, matching each of your cases one by one:
(?:^(?:00|\+|\+\d{2})) matches 00, + or + followed by two numbers at the beginning of your string
\/ matches a / anywhere in the string
\s matches a whitspace anywhere in the string (it matches the newline in the regex101 demo, but I suppose you match each number on its own)
\(\d\) matches a number enclosed in brackets anywhere in the string
The only case not covered by this regex is the input format 01712345678, as you can only take a guess what the country specific prefix can be. If you want it to be 49 by default, then simply replace each input starting with a single 0 with the 49:
$mobile = "01712345678";
$mobile_new = preg_replace("/^0/","49",$mobile);
echo $mobile_new;
//output: 491712345678
This pattern (49)\(?([0-9]{3})[\)\s\/]?([0-9]{8}) will split number in three groups:
49 - country code
3 digits - area code
8 digits - number
After match you can construct clean number just concatnating them by \1\2\3.
Demo: https://regex101.com/r/tE5iY3/1
If this not suits you then please explain more precisely what you want with test input and expected output.
I recommend taking a look at LibPhoneNumber by Google and its port for PHP.
It has support for many formats and countries and is well-maintained. Better not to figure this out yourself.
https://github.com/giggsey/libphonenumber-for-php
$phoneUtil = \libphonenumber\PhoneNumberUtil::getInstance();
$usNumberProto = $phoneUtil->parse("+1 650 253 0000", "US");

PHP - Find number between 2 Unicode characters

Simple problem but i sux at regular expressions so i need here ur help.
What do i need to type to find a number between two first signs: •
Find out its codes but it doenst help me much: http://www.fileformat.info/info/unicode/char/2022/index.htm
Do you know what should i type in for example preg_match function to make it work?
Example:
• 12345 • TESTTESTTEST
Example Output:
12345
Thanks in advance!
To match a specific Unicode code point, use \x{FFFF} where FFFF is the hexadecimal number of the code point you want to match. You can omit leading zeros in the hexadecimal number between the curly braces. Since \x by itself is not a valid regex token, \x{1234} can never be confused to match \x 1234 times. It always matches the Unicode code point U+1234. \x{1234}{5678} will try to match code point U+1234 exactly 5678 times.
Anyway, what you're probably looking for is something like this:
\x{2022} (\d*) \x{2022}
As for the (\d*) part, it basically means match any digit infinite times, and assign this bit of the pattern as a match (braces stand for capture groups)
Actually i found out a way to do it a bit easier.
I used preg_match() with $pattern = "/[0-9]{1,}/";
Huh xD

Detect cloth sizes with regex

I am trying to detect with regex, strings that have a pattern of {any_number}{x-}{large|medium|small} for a site with clothing I am building in PHP.
I have managed to match the sizes against a preconfigured set of strings by using:
$searchFor = '7x-large';
$regex = '/\b'.$searchFor.'\b/';
//Basically, it's finding the letters
//surrounded by a word-boundary (the \b bits).
//So, to find the position:
preg_match($regex, $opt_name, $match, PREG_OFFSET_CAPTURE);
I even managed to detect weird sizes like 41 1/2 with regex, but I am not an expert and I am having a hard time on this.
I have come up with
preg_match("/^(?<![\/\d])([xX\-])(large|medium|small)$/", '7x-large', $match);
but it won't work.
Could you pinpoint what I am doing wrong?
It sounds like you also want to match half sizes. You can use something like this:
$theregex = '~(?i)^\d+(?:\.5)?x-(?:large|medium|small)$~';
if (preg_match($theregex, $yourstring,$m)) {
// Yes! It matches!
// the match is $m[0]
}
else { // nah, no luck...
}
Note that the (?i) makes it case-insensitive.
This also assumes you are validating that an entire string conforms to the pattern. If you want to find the pattern as a substring of a larger string, remove the ^ and $ anchors:
$theregex = '~(?i)\d+(?:\.5)?x-(?:large|medium|small)~';
Look at the specification you have and build it up piece by piece. You want "{any_number}{x-}{large|medium|small}".
"{any_number}" would be \d+. This does not allow fractional numbers such as 12.34, but the question does not specify whether they are required.
"{x-}" is a simple string x-
"{large|medium|small}" is a choice between three alternatives large|medium|small.
Joining the pieces together gives \d+x-(large|medium|small). Note the brackets around the alternation, without then the expression would be interpreted as (\d+x-large)|medium|small.
You mention "weird sizes like 41 1/2" but without specifying how "weird" the number to be matched are. You need a precise specification of what you include in "weird" before you can extend the regular expression.

Regex Capital letter combo

REGEX is something of a mystery to me. After searching on SO, I did download Espresso and went through the tutorial, but things still are not clicking for me. It may just be my specific need, but I haven't found any examples. What I want to do is find matches that are exactly two specific capital (or lowercase, mix) and then a string of numbers. Here are the cases I want to test against:
TL123
TL 123
tl123
tl 123
TLABC123
tlabc123
What I'm then trying to do is preg_replace the results for that match (and ultimately always return TL-123 - for example).
So, any letter or number combo after TL would return TL- and vice-versa. Any nudges in the right direction would be extremely helpful. Thanks!
Edit
It might actually be preg_match_all that I need for this.
To match the specified pattern, you can use:
TL(?:[^0-9]*)(\d+)
This will match a TL followed by anything that isn't a number (or nothing) and then a list of numbers.
You could use this with PHP's preg_replace() like:
$str = preg_replace('/TL(?:[^0-9]*)(\d+)/i', 'TL-$1', $str);
This example, of course, assumes that TL is the exact characters you want to match. If TL is just a placeholder and you could match anything, you could use the following:
preg_replace('/([a-z]{2})(?:[^0-9]*)(\d+)/i', '$1-$2', $str);
With this, I have it hardcoded to only allow 2 characters to match ({2}). You can modify this to any number if you need it to change.
Also, as you want the matched characters to always be uppercase, but can match lowercase, I would suggest to just use strtoupper() around the result (instead of a callback).

Ultimate way to find phone numbers in PHP string with preg_replace

working on a project right now where we have large amount of text strings that we must localize phone numbers and make them clickable for android phones.
The phone numbers can be in different formats, and have different text before and after them. Is there any easy way of detecting every kind of phone number format? Some library that can be used?
Phone numbers can show like, has to work with all these combinations. So that the outcome is like
number
+61 8 9000 7911
+2783 207 5008
+82-2-806-0001
+56 (2) 509 69 00
+44 (0)1625 500125
+1 (305)409 0703
031-704 98 00
+46 31 708 50 60
Perhaps something like this:
/(\+\d+)?\s*(\(\d+\))?([\s-]?\d+)+/
(\+\d+)? = A "+" followed by one or more digits (optional)
\s* = Any number of space characters (optional)
(\(\d+\))? = A "(" followed by one or more digits followed by ")" (optional)
([\s-]?\d+)+ = One or more set of digits, optionally preceded by a space or dash
To be honest, though, I doubt that you'll find a one-expression-to-rule-them-all. Telephone numbers can be in so many different formats that it's probably impractical to match any possible format with no false positives or negatives.
Not sure that there is a library for that. Hmmmm. Like any international amenities, telephone numbers are standardised and there should be a format defining telephone numbers as well. E.164 suggests recommended telephone numbers: http://en.wikipedia.org/wiki/E.164 . All open-source decoding libraries are built from reading these standard formats, so it should be of some help if you really cant find any existing libs
I guess this might do it for these cases?
preg_replace("/(\+?[\d-\(\)\s]{7,}?\d)/", 'number', $str);
Basicly I check if it may start on +. It doesn't have to. Then I check if it got numbers, -, (, ) and spaces with at least 8 cases so it doesn't pick low non-phone numbers.
Try the following:
preg_match_all('/\+?[0-9][\d-\()-\s+]{5,12}[1-9]/', $text, $matches);
or:
preg_match_all('/(\+?[\d-\(\)\s]{8,20}[0-9]?\d)/', $text, $matches);

Categories