How do you split a string based on the number of letter characters and/or the number of numbers so that they are separate strings?
Hopefully this makes sense. Thanks for any help (:
For example:
The user inputs:
Henry, Smith ID: 123456
I would like to sort the user input into separate strings with the result of:
$name = 'Henry, Smith';
$ID = '123456';
You can use regex to match only numbers and everything but numbers.
$number = preg_replace("/[^0-9]/", '', $str);
$name = preg_replace("/[0-9]/", '', $str);
Note, for the name, this will return Henry, Smith ID: from your question's example. This just takes the numbers out... it doesn't know "ID:" isn't part of a person's name.
Explanation of the caret (^):
Inside the brackets it means match everything NOT in the brackets. So [^0-9] matches everything but numbers. In this example, it'll replace everything that isn't a number with a blank (second parameter). For the $name, we do the opposite. We replace everything that IS a number with a blank to just get the non-digit characters.
See here for more info on regex.
Related
I want to split a string as per the parameters laid out in the title. I've tried a few different things including using preg_match with not much success so far and I feel like there may be a simpler solution that I haven't clocked on to.
I have a regex that matches the "price" mentioned in the title (see below).
/(?=.)\£(([1-9][0-9]{0,2}(,[0-9]{3})*)|[0-9]+)?(\.[0-9]{1,2})?/
And here are a few example scenarios and what my desired outcome would be:
Example 1:
input: "This string should not split as the only periods that appear are here £19.99 and also at the end."
output: n/a
Example 2:
input: "This string should split right here. As the period is not part of a price or at the end of the string."
output: "This string should split right here"
Example 3:
input: "There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price"
output: "There is a price in this string £19.99, but it should only split at this point"
I suggest using
preg_split('~\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?(*SKIP)(*F)|\.(?!\s*$)~u', $string)
See the regex demo.
The pattern matches your pattern, \£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})? and skips it with (*SKIP)(*F), else, it matches a non-final . with \.(?!\s*$) (even if there is trailing whitespace chars).
If you really only need to split on the first occurrence of the qualifying dot you can use a matching approach:
preg_match('~^((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?|[^.])+)\.(.*)~su', $string, $match)
See the regex demo. Here,
^ - matches a string start position
((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?|[^.])+) - one or more occurrences of your currency pattern or any one char other than a . char
\. - a . char
(.*) - Group 2: the rest of the string.
To split a text into sentences avoiding the different pitfalls like dots or thousand separators in numbers and some abbreviations (like etc.), the best tool is intlBreakIterator designed to deal with natural language:
$str = 'There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price';
$si = IntlBreakIterator::createSentenceInstance('en-US');
$si->setText($str);
$si->next();
echo substr($str, 0, $si->current());
IntlBreakIterator::createSentenceInstance returns an iterator that gives the indexes of the different sentences in the string.
It takes in account ?, ! and ... too. In addition to numbers or prices pitfalls, it works also well with this kind of string:
$str = 'John Smith, Jr. was running naked through the garden crying "catch me! catch me!", but no one was chasing him. His psychatre looked at him from the window with a circumspect eye.';
More about rules used by IntlBreakIterator here.
You could simply use this regex:
\.
Since you only have a space after the first sentence (and not a price), this should work just as well, right?
I have a serial number string I need to break apart into 3 parts.
The serial numbers look like this:
FOOB123456AB
BAR789123BC
First part: A-Z letters of variable length
Middle part: 6 digit numerical string
Last part: 2 digit letters
How can I break this apart using PHP so I can work with each individual part?
Regular expression can help here. See preg_match().
Try:
$regex = "/^([a-z]*)([\d]{6})(.*)$/i";
$serial = "FOOB123456AB";
$result = preg_match($regex, $serial, $matches);
// result in $matches[1], $matches[2], $matches[3]
This assumes one serial number per string. If you don't have that, text is easy to break up with explode() or similar, and then iterate over the resulting array.
I have a value like this 73b6424b. I want to split value into two parts. Like 73b6 and 424b. Then the two split value want to reverse. Like 424b and 73b6. And concatenate this two value like this 424b73b6. I have already done this like way
$substr_device_value = 73b6424b;
$first_value = substr($substr_device_value,0,4);
$second_value = substr($substr_device_value,4,8);
$final_value = $second_value.$first_value;
I am searching more than easy way what I have done. Is it possible?? If yes then approach please
You may use
preg_replace('~^(.{4})(.{4})$~', '$2$1', $s)
See the regex demo
Details
^ - matches the string start position
(.{4}) - captures any 4 chars into Group 1 ($1)
(.{4}) - captures any 4 chars into Group 2 ($2)
$ - end of string.
The '$2$1' replacement pattern swaps the values.
NOTE: If you want to pre-validate the data before swapping, you may replace . pattern with a more specific one, say, \w to only match word chars, or [[:alnum:]] to only match alphanumeric chars, or [0-9a-z] if you plan to only match strings containing digits and lowercase ASCII letters.
I am trying to use a regular expression to pick a phone number from a string, where the format of the phone number could be just about anything, or there may not be a phone number at all. For example:
$string = 'My phone number is +34 961 123456.';
$string = 'My phone number is +34 (961) 123456.';
$string = 'My phone number is 961-123456.';
$string = 'My phone number is +34.961.12.34.56.';
$string = 'Product A costs €100.00 and Product B costs €134.15.';
So far, I have got to
$number = preg_replace("/[^0-9\/\+\.\-\s]+/", "", $string);
$number = preg_replace("/[^0-9]+/", "", $number);
if (strlen($number)>8) {
/* It's a phone number, so do something with it */
}
This works for picking out all the different phone number formats that I have tried, but it also puts the prices together and assumes that they are a phone number too.
It seems that my problem is that a human can readily distinguish between a space between words and a space in the middle of a phone number, but how do I make the computer do that? Is there a way that I can replace spaces that are both preceded and followed by a number but leave other spaces intact? Is there some other way of sorting this out?
I'm afraid you aren't gonna like it. The regex I get is this:
(\+?[0-9]?[0-9]?[[:blank:],\.]?[0-9][0-9][0-9][[:blank:],\.]?[0-9][0-9][[:blank:],\.]?[0-9][0-9][[:blank:],\.]?[0-9][0-9])
Explanation:
( <-- is for "grouping" and get the regular expression, probably not needed here
\+? <-- optional plus sign
[0-9]?[0-9]? <-- optional prefix code
[[:blank:],\.]? <-- optional space (or comma or dot) between the prefix code and the rest of the number
[0-9][0-9][0-9][[:blank:],\.]? <-- optional province code
[0-9][0-9][[:blank:],\.]?[0-9][0-9][[:blank:],\.]?[0-9][0-9] <-- number, composed by six numbers
Because these examples are for spanish telephone numbers, aren't they???
In that case, you've forgotten to give us examples of other formats, like "91 123 45 67", that might complicate the solution even more.
For these cases, I humbly think that is a best solution to make a little function. The regular expression is too complex to be a maintenable solution.
Looks like you want sequences of nine to twelve digits, with nothing between them except spaces, parentheses, periods or dashes; and possibly preceded by +. Try this:
preg_match_all("/\+?(?:\d[-. ()]*){9,12}/", $string, $results);
This isn't quite perfect, since trailing punctuation (like the period that follows all your examples) will be included in the matched string. Post-process the list of results to trim it:
preg_replace("/[-. ]+$/", "", $results);
Or you could standardize the collected phone numbers by removing all non-digits from the results, keeping just the digits and possibly an initial "+":
preg_replace("/[-. ()]/", "", $results);
Currently I am developing a web application to fetch Twitter stream and trying to create a natural language processing by my own.
Since my data is from Twitter (limited by 140 characters) there are many words shortened, or on this case, omitted space.
For example:
"Hi, my name is Bob. I m 19yo and 170cm tall"
Should be tokenized to:
- hi
- my
- name
- bob
- i
- 19
- yo
- 170
- cm
- tall
Notice that 19 and yo in 19yo have no space between them. I use it mostly for extracting numbers with their units.
Simply, what I need is a way to 'explode' each tokens that has number in it by chunk of numbers or letters without delimiter.
'123abc' will be ['123', 'abc']
'abc123' will be ['abc', '123']
'abc123xyz' will be ['abc', '123', 'xyz']
and so on.
What is the best way to achieve it in PHP?
I found something close to it, but it's C# and spesifically for day/month splitting. How do I split a string in C# based on letters and numbers
You can use preg_split
$string = "Hi, my name is Bob. I m 19yo and 170cm tall";
$parts = preg_split("/(,?\s+)|((?<=[a-z])(?=\d))|((?<=\d)(?=[a-z]))/i", $string);
var_dump ($parts);
When matching against the digit-letter boundary, the regular expression match must be zero-width. The characters themselves must not be included in the match. For this the zero-width lookarounds are useful.
http://codepad.org/i4Y6r6VS
how about this:
you extract numbers from string by using regexps, store them in an array, replace numbers in string with some kind of special character, which will 'hold' their position. and after parsing the string created only by your special chars and normal chars, you will feed your numbers from array to theirs reserved places.
just an idea, but imho might work for you.
EDIT:
try to run this short code, hopefully you will see my point in the output. (this code doesnt work on codepad, dont know why)
<?php
$str = "Hi, my name is Bob. I m 19yo and 170cm tall";
preg_match_all("#\d+#", $str, $matches);
$str = preg_replace("!\d+!", "#SPEC#", $str);
print_r($matches[0]);
print $str;