I want to extract values from string. here is a sample string
"Sample string Name: Mosa Phone: 020202020 Email: email#domain.com the rest of the sample string"
The phone doesn't necessary to be sequence of numbers it could be like (00-98550-22) or (+025-588) or (92122/222)
The good news is that these fields are always consecutive either separated by tab, white space, or new line.
so I am thinking how can I make it find the fields until the next field is found, so we can say find Name: then continue until you find Phone:
I am trying to achieve this using regex
This is the code I already wrote, but each field is evaluated alone
$namepattern = "/(Name\:\s[a-zA-Z]+\s[a-zA-Z]+)/";
$phonepattern = "/(Phone\:\s\d+)/";
$emailpattern = "/(Email:\s([_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})))/";
I'm using this regex to match name, phone and email in one:
/Name:\s(.+)\sPhone:\s(\d+)\sEmail:\s([a-zA-Z0-9_-]+#[a-zA-Z0-9]+.[a-zA-Z]+)/
Here's a demo: http://regex101.com/r/uO4tF7
This will match the needed
/Name:\s(.+)\s*Phone:\s(.*)\s*Email:\s([a-zA-Z0-9_.-]+#[a-zA-Z0-9]+.[a-zA-Z]+)/
Related
I want to split a string as per the parameters laid out in the title. I've tried a few different things including using preg_match with not much success so far and I feel like there may be a simpler solution that I haven't clocked on to.
I have a regex that matches the "price" mentioned in the title (see below).
/(?=.)\£(([1-9][0-9]{0,2}(,[0-9]{3})*)|[0-9]+)?(\.[0-9]{1,2})?/
And here are a few example scenarios and what my desired outcome would be:
Example 1:
input: "This string should not split as the only periods that appear are here £19.99 and also at the end."
output: n/a
Example 2:
input: "This string should split right here. As the period is not part of a price or at the end of the string."
output: "This string should split right here"
Example 3:
input: "There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price"
output: "There is a price in this string £19.99, but it should only split at this point"
I suggest using
preg_split('~\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?(*SKIP)(*F)|\.(?!\s*$)~u', $string)
See the regex demo.
The pattern matches your pattern, \£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})? and skips it with (*SKIP)(*F), else, it matches a non-final . with \.(?!\s*$) (even if there is trailing whitespace chars).
If you really only need to split on the first occurrence of the qualifying dot you can use a matching approach:
preg_match('~^((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?|[^.])+)\.(.*)~su', $string, $match)
See the regex demo. Here,
^ - matches a string start position
((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?|[^.])+) - one or more occurrences of your currency pattern or any one char other than a . char
\. - a . char
(.*) - Group 2: the rest of the string.
To split a text into sentences avoiding the different pitfalls like dots or thousand separators in numbers and some abbreviations (like etc.), the best tool is intlBreakIterator designed to deal with natural language:
$str = 'There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price';
$si = IntlBreakIterator::createSentenceInstance('en-US');
$si->setText($str);
$si->next();
echo substr($str, 0, $si->current());
IntlBreakIterator::createSentenceInstance returns an iterator that gives the indexes of the different sentences in the string.
It takes in account ?, ! and ... too. In addition to numbers or prices pitfalls, it works also well with this kind of string:
$str = 'John Smith, Jr. was running naked through the garden crying "catch me! catch me!", but no one was chasing him. His psychatre looked at him from the window with a circumspect eye.';
More about rules used by IntlBreakIterator here.
You could simply use this regex:
\.
Since you only have a space after the first sentence (and not a price), this should work just as well, right?
I'm trying to get the first part of a UK postcode from a string that may have only the first part of the postcode or the full postcode in it. I'm struggling to make it work. I've got it working if the full postcode is entered by using a look-ahead, but I can't seem to make the look-ahead optional, so if only the first part of the postcode is entered it is matched.
My regex so far is ([A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW])(?=( ?[0-9][ABD-HJLNP-UW-Z]{2})))
I've got several postcodes that must match and these are the results using the above regex:
A10EA - Should match and does
A1 - Should match but doesn't
A10 0EA - Should match and does
A10 - Should match but doesn't
BH18 1AE - Should match and does
BH18AE - Should match and does
EC1M 6HJ - Should match and does
EC1M - Should match but doesn't
Z10 2EV - Shouldn't match and doesn't
QE3 6DA - Shouldn't match but matches E3 6DA
Can someone please help me solve this issue?
The RegEx I've been working from is the official one from the post office:
/^(GIR ?0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?[0-9][ABD-HJLNP-UW-Z]{2})$/i
Before anyone flags this as a duplicate of PHP Find first part of UK postcode when full or part can be entered, it's not. The answer for that question doesn't work, see my comment to the answer.
According this wiki page the post code always ends in 'digit letter letter', that would be a regex pattern of \d\w\w$. Now we know how to spot what the end is, we just want to capture the rest.
A pattern like (\S*)\s*\d\w\w$ will work. That will capture the first half, and ensure that you do not get the last 'digit letter letter part. It will capture the first part by getting anything not white space, ie only letters and digits.
To fully explain this, the brackets () is what we are capturing. \S says 'any one non white space character, with \S*being all that we can get. so (\S*) captures everything up to a space character, but will capture everything if the user doesn't enter one. The full regex I provided will also try to capture 'any white space, one digit, two letters, end of string' which will ensure that AA999AA is split into AA99 and 9AA.
I've also just noticed though that your question states you might not actually have that second part. I think you could get around that by checking the string length. If you trim white space and the length is less than 5 characters, you must only have the first part, so no need for any regex.
disclaimer this will not work for Anguillan postcodes. To support their postcodes as well, I think (\S*)\s*(?:\d\w\w|-\d{4})$ would work.
I've been looking at this the wrong way. I want to get the first part of the postcode and remove the second part if present, so why not validate the postcode first, then check for an end and strip it if necessary.
I'm already validating the postcode, this is code I already had:
$validate = Validation::factory(array('postcode' => $postcode));
$validate->rule('postcode', 'not_empty');
$validate->rule('postcode', 'regex', array(':value', '/^(GIR ?(0AA)?|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?([0-9][ABD-HJLNP-UW-Z]{2})?)$/i'));
if ( ! $validate->check())
{
$postcode = '';
}
So now I've added in this after it:
if ($postcode)
{
$short_postcode = $postcode;
// Check for an end section and then if present, remove it
if (preg_match('/ ?([0-9])[ABD-HJLNP-UW-Z]{2})$/i', $postcode, $match, PREG_OFFSET_CAPTURE))
{
$short_postcode = substr($postcode, 0, $match[0][1]);
}
}
and this leaves me with just the first part of the postcode, which is what I wanted. This Eval.in shows it working for all the examples in my question.
I am using a regular expression to convert #user name to links.
For example if user enters #Alex Ferguson it should convert Alex Ferguson to hyperlink.
Here it's converting the first name to hyper link and excluding the last name.It looks for the word closer to #, if there is no space between first name and last name it works fine.
Is there any way to convert both first name and last name to hyper link.
Here is my code:
function convert($msg){
$message = preg_replace(array('/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))/', '/(^|[^a-z0-9_])#([a-z0-9_]+)/i', '/(^|[^a-z0-9_])#([a-z0-9_]+)/i'), array('$1', '$1#$2', '$1#$2'), $msg);
return $message;
}
Thanks..
The general method for this would be:
$regex = '~(?i)#[a-z]+[ ][a-z]+~';
$replaced = preg_replace($regex,'$0',$string);
Notes
I'll leave it for you to fill in the blanks
One issue with names is the range of allowable characters. What about Julie O'Hara? M.C. Cocoa? etc.
I was hoping for a little help on this, as it's confusing me a little... I run a website that allows users to send messages back and forth, but on the inbox i need to hide both emails and phone numbers.
Example: This is how a sample email would look like.
Hi, my phone is +44 5555555 and email is jack#jack.com
I need it to be like this:
Hi, my phone is (phone hidden) and email is (email hidden)
Do you have any ideas ?... I really appreciate it!..
$x = 'Hi, my phone is +44 5555555 and email is jack#jack.com';
$x = preg_replace('/[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}/i','(phone hidden)',$x); // extract email
$x = preg_replace('/(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+))?/','(email hidden)',$x); // extract phonenumber
echo $x; // Hi, my phone is (phone hidden) and email is (email hidden)
kudo's for the phonenumber regex to fatcat
Trying to do this with 100% accuracy when users can type all sorts of things in is impossible - you can't really definitively say if a substring is a phone number or just another number, or an email address or just something that could be a valid one.
However, if you want to try, you should probably use a regular expression. See http://php.net/manual/en/function.preg-replace.php
<pre>
/*
* first par is given string
* second per is replace string like ****
* return result
*/
function email_phone_validation_replace_php($str='',$rep='*******') {
$str = preg_replace('/[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}/i',$rep,$str); // extract email
$str = preg_replace('/(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+))?/',$rep,$str); // extract phonenumber
return $str; // banarsiamin#gmail.com
}
$str = 'Hi, my name is amin khan banarsi <br> phone is +91 9770534045 and email is banarsiamin#gmail.com';
echo email_phone_validation_replace_php($str);
</pre>
If I understand correctly, your users can send messages to each other and you're worried that if they send a message with personal information in it that information might be too visible.
I guess that you're therefore trying to remove this information from the message's preview (but still have it available if you open the message?).
If this is the case then you can have a really sloppy regular expression removing anything that looks even a little bit like a number or email. It doesn't matter if you hide non-personal information because the non-censored version of the message is always available.
I would go with something like this (untested):
# Take any string that contains an # symbol and replace it with ...
# The # symbol must be surrounded by at least one character on both sides
$message = preg_replace('/[^ ]+#[^ ]+/','...',$message); # for emails
# Take any string that contains only numbers, spaces and dashes, replace with ...
# Can optionally have a + before it.
$message = preg_replace('/\+?[0-9\- ]+/','...',$message); # for phone numbers
This is going to match lots of things, more than just emails and phone numbers. It may also not match emails and phone numbers that I didn't think of, this is one of the problems with writing regular expressions for these kinds of things.
If you want to hide email and phone numbers from your Messages or Chat in PHP or any other language. You need to use regular expressions, read about regex on w3school.
I have an easy and complete solution for you.
<?php
$regex_email = '/[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})/';
$regex_phone = "/[0-9]{5,}|\d[ 0-9 ]{1,}\d|\sone|\stwo|\sthree|\sfour|\sfive|\ssix|\sseven|\seight|\snine|\sten/i";
$str = " Hello My Email soroutlove1996#gmail.com AND Phone No. is +919992799999 and +91 9992799999, or 9 9 9 2 7 9 9 9 9 9 and Nine Nine";;
$str = preg_replace($regex_email,'(email hidden)',$str); // extract email
$str = preg_replace($regex_phone,'(phone hidden)',$str); // extract phone
echo $str;
Output: Hello My Email (email hidden) AND Phone No. is +(phone
hidden) and +(phone hidden), or (phone hidden) and(phone hidden)(phone
hidden)
I have a string like this:
Name: John Doe
Age: 23
Primary Language: English
Description: This is a multiline
description field that I want
to capture
Country: Canada
That's not the actual data, but you can see what I'm trying to do. I want to use regex to get an array of the "key" fields (Name, Age, Primary Language, Description, Country) and their values.
I'm using PHP.
My current attempt is this, but it doesn't work:
preg_match( '/^(.*?\:) (.*?)(\n.*?\:)/ism', $text, $matches );
Here's one solution: http://rubular.com/r/uDgXcIvhac.
\s*([^:]+?)\s*:\s*(.*(?:\s*(?!.*:).*)*)\s*
Note that I've used a negative lookahead assertion, (?!.*:). This is the only way you can check that the next line doesn't look like a new field, and at the same time continue where you left off. (This is why lookaheads and lookbehinds are known as zero-width assertions.)
EDIT: Removed bit about arbitrary-width lookaheads; I was mistaken. The above solution is fine.
Would PHP's strtok help you? You could use it with ":" as the delimeter/token and trim leading and trailing spaces to remove the unwanted new lines.
http://php.net/manual/en/function.strtok.php