Regular expression in php to find roman numerals - php

I use PHP to highlight all the roman numerals in string.
For example:
Protocol XXXIV/14 from session...
Protocol XXIX/13 from session...
Protocol XXXV/13 from session...
So I've found a perfect example on http://regexr.com/2uhln. It works good for above examples, but when I try to use it in php, it stops work.
My PHP code is
$subject = "Protocol XXXV/13 from session...";
$pattern ='/(?:XL|L|L?(?:IX|X{1,3}|X{0,3}(?:IX|IV|V|V?I{1,3})))/';
preg_match($pattern,$subject,$matches);
It outputs just 1-3 characters from roman numeral, so
XXXIV - gives XXX
XXIX - gives XX
XXXV - gives XXX
I have two questions:
What is wrong? How to fix it?
how to modify regular expression from http://regexr.com/2uhln to work for all roman numerals up to one hundred (roman C). It doesnt work ex. XLVII, XLVI, XLV.

Change the order of your pattern. That is, place the longest pattern as first, then medium finally short. syntax would be like long|medium|short . So that the longest string would be matched first.
$re = "~L?(?:X{0,3}(?:IX|IV|V|V?I{1,3})|IX|X{1,3})|XL|L~m";
$str = "Protocol XXXIV/14 from session...\nProtocol XXIX/13 from session...\nProtocol XXXV/13 from session...";
preg_match_all($re, $str, $matches);
print_r($matches);
Update:
\b(?:X?L?(?:X{0,3}(?:IX|IV|V|V?I{1,3})|IX|X{1,3})|XL|L)\b
DEMO

Related

PHP Regex to get text between 2 words with numbers

i'm trying to get the string between two words in a entire string:
Ex.:
My string:
...'Total a Facturar 123,061 221,063 26,860161,16080,580310,760 358,297 Recepcionado'...
I'm using
/(?<=Total a Facturar )(.*?) Recepcionado/
I need the highlighted characters (26,860161,16080,580310,760)
and i get 221,061 221,063 26,860161,16080,580310,760 358,297 Recepcionado with my pattern.
The numbers of the string are always different, i need the numbers that are together without a space.
Thanks
EDIT:
Here is the entire string: eval.in/802292
I hope this will be helpful
Regex demo or Regex demo 2
Regex: (?:\d+(?:\,\d+){2,})
For above question you can also use it like this (?:\d+(?:\,\d+){4})
1. (?:\d+) this will match digits one or more.
2. (?:\,\d+){2,} Adding this in expression will match patterns like , and digits {2,} for 2 or more than 2 times.
PHP code: Try this code snippet here
<?php
ini_set('display_errors', 1);
$string = "Total a Facturar 123,061 221,063 26,860161,16080,580310,760 358,297 Recepcionado";
preg_match("#(?:\d+(?:\,\d+){2,})#", $string, $matches);
print_r($matches);

Grabbing number next to a dollar sign with optional thousands and decimals

I am trying to grab a number that can be in the format $5,000.23 as well as say, $22.43 or $3,000
Here's my regular expression, this is in PHP.
preg_match('/\$([0-9]+)([\.,]*)?([0-9]*)?([\.])?([0-9]*)?/', $blah, $blah2);
It seems to match numbers in the format $5,500.23 perfectly fine, however it doesn't seem to match any other numbers well, like $0.
How do I make everything optional? Shouldn't grouping () and using a question mark do that?
This should do the trick:
\$[\d,.]*[\d]
Debuggex Demo
Specific PHP Example:
$re = "/\\$[\\d,.]*[\\d]/";
$str = "\$1 klsjdfgsjdfg \$100 kjdfhglsjdfg \$1,000 jljsdfg \$1,000.00 ldfjhsdf";
preg_match_all($re, $str, $matches);
Regex 101 Demo

Detect cloth sizes with regex

I am trying to detect with regex, strings that have a pattern of {any_number}{x-}{large|medium|small} for a site with clothing I am building in PHP.
I have managed to match the sizes against a preconfigured set of strings by using:
$searchFor = '7x-large';
$regex = '/\b'.$searchFor.'\b/';
//Basically, it's finding the letters
//surrounded by a word-boundary (the \b bits).
//So, to find the position:
preg_match($regex, $opt_name, $match, PREG_OFFSET_CAPTURE);
I even managed to detect weird sizes like 41 1/2 with regex, but I am not an expert and I am having a hard time on this.
I have come up with
preg_match("/^(?<![\/\d])([xX\-])(large|medium|small)$/", '7x-large', $match);
but it won't work.
Could you pinpoint what I am doing wrong?
It sounds like you also want to match half sizes. You can use something like this:
$theregex = '~(?i)^\d+(?:\.5)?x-(?:large|medium|small)$~';
if (preg_match($theregex, $yourstring,$m)) {
// Yes! It matches!
// the match is $m[0]
}
else { // nah, no luck...
}
Note that the (?i) makes it case-insensitive.
This also assumes you are validating that an entire string conforms to the pattern. If you want to find the pattern as a substring of a larger string, remove the ^ and $ anchors:
$theregex = '~(?i)\d+(?:\.5)?x-(?:large|medium|small)~';
Look at the specification you have and build it up piece by piece. You want "{any_number}{x-}{large|medium|small}".
"{any_number}" would be \d+. This does not allow fractional numbers such as 12.34, but the question does not specify whether they are required.
"{x-}" is a simple string x-
"{large|medium|small}" is a choice between three alternatives large|medium|small.
Joining the pieces together gives \d+x-(large|medium|small). Note the brackets around the alternation, without then the expression would be interpreted as (\d+x-large)|medium|small.
You mention "weird sizes like 41 1/2" but without specifying how "weird" the number to be matched are. You need a precise specification of what you include in "weird" before you can extend the regular expression.

PHP - know characters failed in a preg_match function

There is a method to know which characters does not match a preg_match function?
For example:
preg_match('/^[a-z]*$/i', 'Hello World!');
Is there some function to know the incorrect char, in this case spance and "!"?
Thanks for your replies, but the problem in your examples is you don't indicate the begin and the end of the string. Your examples works with string contained in another one and not with the string that is exactly like I defined in the pattern.
For example, if I had to validate the italian fiscal code of a subject, composed by a string formatted like this:
XXX XXX YY X YY X YYY X (X = letter, Y = number - without spaces)
which pattern is:
'/^[A-Z]{6}[0-9]{2}[A-Z]{1}[0-9]{2}[A-Z]{1}[0-9]{3}[A-Z]{1}$/i'
I must validate the string that match exactly what I defined in the pattern.
If I use your code and I wrong 1 (only 1) character, the whole string was returned as error.
http://eval.in/9178
The problem of the reverse pattern occurs in a complex pattern, where are inserted the AND or the OR.
What I want to know is why the preg_match fails and not only if it fails or not.
Have you tried something like this?
$nonMatchingCharacters = preg_replace('/[a-z]/', '', $wholeString);
That should strip out the 'legal' characters, leaving only the ones that you want to mention in your validation error message.
You could also do other treatments like...
$nonMatchingCharactersArray = array_unique(explode('', $nonMatchingCharacters));
...if you want an array of unique, non-matching characters, and not just a string with bits stripped out of it.
That will indicate you the space and !
preg_match_all('/[^a-z]/i', 'Hello World!', $matches);
var_dump($matches);
http://eval.in/9132
Just remove everything that matches with preg_replace, then split into an array what remains.
<?php
$str = preg_replace('/([0-9]{2}[a-z]*)/i', '', '03Hello 02World!');
$characters = str_split($str);
var_dump($characters);
http://eval.in/9152

Splitting string containing letters and numbers not separated by any particular delimiter in PHP

Currently I am developing a web application to fetch Twitter stream and trying to create a natural language processing by my own.
Since my data is from Twitter (limited by 140 characters) there are many words shortened, or on this case, omitted space.
For example:
"Hi, my name is Bob. I m 19yo and 170cm tall"
Should be tokenized to:
- hi
- my
- name
- bob
- i
- 19
- yo
- 170
- cm
- tall
Notice that 19 and yo in 19yo have no space between them. I use it mostly for extracting numbers with their units.
Simply, what I need is a way to 'explode' each tokens that has number in it by chunk of numbers or letters without delimiter.
'123abc' will be ['123', 'abc']
'abc123' will be ['abc', '123']
'abc123xyz' will be ['abc', '123', 'xyz']
and so on.
What is the best way to achieve it in PHP?
I found something close to it, but it's C# and spesifically for day/month splitting. How do I split a string in C# based on letters and numbers
You can use preg_split
$string = "Hi, my name is Bob. I m 19yo and 170cm tall";
$parts = preg_split("/(,?\s+)|((?<=[a-z])(?=\d))|((?<=\d)(?=[a-z]))/i", $string);
var_dump ($parts);
When matching against the digit-letter boundary, the regular expression match must be zero-width. The characters themselves must not be included in the match. For this the zero-width lookarounds are useful.
http://codepad.org/i4Y6r6VS
how about this:
you extract numbers from string by using regexps, store them in an array, replace numbers in string with some kind of special character, which will 'hold' their position. and after parsing the string created only by your special chars and normal chars, you will feed your numbers from array to theirs reserved places.
just an idea, but imho might work for you.
EDIT:
try to run this short code, hopefully you will see my point in the output. (this code doesnt work on codepad, dont know why)
<?php
$str = "Hi, my name is Bob. I m 19yo and 170cm tall";
preg_match_all("#\d+#", $str, $matches);
$str = preg_replace("!\d+!", "#SPEC#", $str);
print_r($matches[0]);
print $str;

Categories