Regular expression to match identical repeated digits in phone numbers - php

Sorry if the title wasnt descriptive enough. I have a bunch of phone numbers in a mysql database. Dont know if there is a query to do this or better to use something like preg_match with PHP. But I need to search using a pattern like so:
Ends with XXXX
or
Contains 4XXX
The X means the same number. So if I searched for Ends with XXXX Im looking for any number like so:
671-0000
421-5555
789-1111
If I search Contains 4XXX then Im looking for any number like so:
345-4111
156-4777
For some reason I cant wrap my brain around this. Seems like it would be pretty easy. Can anyone help? Appreciate it!

The simplest expression here for the NXXX pattern is:
\d(\d)\1{2}
If the regular expression engine you're using supports that kind of back-tracking with \1, which references whatever digit happens to be in the (\d) spot, then this should work as in this example.
You could also adapt that for the XXXX pattern like:
(\d)\1{3}
Where that's three repeated, identical digits after the first.

Related

Regular Expression in Serialized Data

I am looking to a database search on serialized data. I am currently using Symfony2 as my Framework making pdo_mysql calls using Doctrine 2. What I would like to do is create a query that uses REGEXP to find data within a certian part of the array. The data I am trying to search within looks like this: -
a:1:{s:8:"bedrooms";a:5:{i:0;i:1;i:1;i:2;i:2;i:3;i:3;i:4;i:4;s:2:"5+";}}
So let's say I am looking for a record that has 3 bedrooms, then I would want it to find: -
i:2;i:3
The query I have come up with so far is: -
SELECT * FROM table WHERE field_name REGEXP '.*"bedrooms"; a:[0-9]+:{i:[0-9]+;i:3;}.*';
However this doesn't work. Can someone help me find a fix around this please? I think it's down to the way the regular expression is written.
Also its worth noting that there are other arrays stored in the field such credit limits and other data.
Thank you in advance.
I believe you can do it with the help of negated character class [^{}] that matches any character but a { and }:
.*"bedrooms";a:[0-9]+:[{][^{}]*i:[0-9]+;i:3[^{}]*[}]
See the regex demo
I see at least 2 mistakes and improvements you can do
first, in regex drop the blank space after "bedrooms";
you should scape the curly braces like \{ and \} since they are not literal for regex engine
if you are interested in a specific chunk in the string you must specify it as a group and inform what kind of characters are around, like
"bedrooms";a:[0-9]+:\{.*(i:[0-9];i:3).*\}
In this case in looking for i:*:i:3 where * is any digit

Basic Regular Expression for

For some reason I always get stuck making anything past extremely basic regular expressions.
I'm trying to make a regular expression that kind of looks like a URL. I only want basic checking.
I would like it to match the following patterns where X is "something".
X://X.X
X://X.X... etc.
X.X
X.X... etc
If the string contains one of these patterns, it is sufficient checking for me. This way a url like www.example.com:8888 will still match. I have tried many different REGEX combinations with preg_match and cannot seem to get any to behave the way I want it to. I have consulted many other related REGEX questions on SO but my readings have not helped me.
Any help? I will be happy to provide more information if you would like but I don't know what else you would need.
It takes practice but here is one that I made using a regex tester (http://www.regextester.com/) to check my pattern:
^.+(:\/\/|\.)([a-zA-Z0-9]+\.)+.+
My approach is to slowly build my pattern from the beginning and add on one piece at a time. This cheatsheet is extremely helpful for remembering http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/ what everything is.
Basically the pattern starts at the beginning of the string and checks for any characters followed by either :// or . then checks for groupings of letters and numbers followed by a . ending with any number of characters.
The pattern could probably be improved with groupings to not pass on invalid characters. But this one was quick and dirty. You could replace the first and last . with the characters that would be valid.
UPDATE
Per the comments here is an updated pattern:
^.+?(:\/\/|\.)?([a-zA-Z0-9]+?\.)+.+
/^(.+:\/\/)?[^.]+\.[^.\/]+([.\/][^.\/]+)*$/

Single regular expression that extracts a number from two different url formats?

I am trying to create a single regular expression that I can use to extract the number from two different urls in a PHP function. The format of these urls are:
/t/2121/title/
and
/top2121.html
I am bad at regular expressions and have already tried the following and many variants of it:
#^/t/(\d+?)/|/top(\d+?)\.html/#i
This is not doing anything and I am still at a complete loss after reading many sites and tutorials on regular expressions. Is there a regular expression I could create that would allow me to extra the number regardless of the url format entered?
Regex to extract only the digits while also checking if url matches accepted formats:
#^\/t(?:\/(\d+)\/[a-z_-]+\/?|op(\d+)\.html)$#i edit: captures in 2 groups
Explained demo here: http://regex101.com/r/dO5dI4
Variant #2: captures in the same group
#^\/t(?|\/(\d+)\/[a-z_-]+\/?$|op(\d+)\.html$)#i
Explained demo here: http://regex101.com/r/cG9vC3
if you just want the first digits after t regardless of the / between, something like this might work: #t/?(\d+)#i
edit:
example: http://codepad.viper-7.com/0z3ee0
I was able to get this regexp to match both types of url formats:
#^/(?:(?:t/)|(?:top))(\d+)(?:(?:\.html)|(?:/))#i
If anyone has a more efficient way of performing the same regexp, I would love to hear it.
If you got either one of these URL's you could use this expression. Your numbers should be stored in your second position:
#^/t(op|/)(\d+)(\.html|/.*)#i
Are there ever going to be numbers in the URL that you don't care about? If not, you can keep this simple by just capturing the numbers and ignoring the rest:
#(\d+)#

Regex with negative lookahead to ignore the word "class"

I'm getting insane over this, it's so simple, yet I can't figure out the right regex. I need a regex that will match blacklisted words, ie "ass".
For example, in this string:
<span class="bob">Blacklisted word was here</span>bass
I tried that regex:
((?!class)ass)
That matches the "ass" in the word "bass" bot NOT "class".
This regex flags "ass" in both occurences. I checked multiple negative lookaheads on google and none works.
NOTE: This is for a CMS, for moderators to easily find potentially bad words, I know you cannot rely on a computer to do the filtering.
If you have lookbehind available (which, IIRC, JavaScript does not and that seems likely what you're using this for) (just noticed the PHP tag; you probably have lookbehind available), this is very trivial:
(?<!cl)(ass)
Without lookbehind, you probably need to do something like this:
(?:(?!cl)..|^.?)(ass)
That's ass, with any two characters before as long as they are not cl, or ass that's zero or one characters after the beginning of the line.
Note that this is probably not the best way to implement a blacklist, though. You probably want this:
\bass\b
Which will match the word ass but not any word that includes ass in it (like association or bass or whatever else).
It seems to me that you're actually trying to use two lists here: one for words that should be excluded (even if one is a part of some other word), and another for words that should not be changed at all - even though they have the words from the first list as substrings.
The trick here is to know where to use the lookbehind:
/ass(?<!class)/
In other words, the good word negative lookbehind should follow the bad word pattern, not precede it. Then it would work correctly.
You can even get some of them in a row:
/ass(?<!class)(?<!pass)(?<!bass)/
This, though, will match both passhole and pass. ) To make it even more bullet-proof, we can add checking the word boundaries:
/ass(?<!\bclass\b)(?<!\bpass\b)(?<!\bbass\b)/
UPDATE: of course, it's more efficient to check for parts of the string, with (?<!cl)(?<!b) etc. But my point was that you can still use the whole words from whitelist in the regex.
Then again, perhaps it'd be wise to prepare the whitelists accordingly (so shorter patterns will have to be checked).
Is this one is what you want ? (?<!class)(\w+ass)

how to detect telephone numbers in a text (and replace them)?

I know it can be done for bad words (checking an array of preset words) but how to detect telephone numbers in a long text?
I'm building a website in PHP for a client who needs to avoid people using the description field to put their mobile phone numbers..(see craigslist etc..)
beside he's going to need some moderation but i was wondering if there is a way to block at least the obvious like nnn-nnn-nnnn, not asking to block other weird way of writing like HeiGHT*/four*/nine etc...
Welcome to the world of regular expressions. You're basically going to want to use preg_replace to look for (some pattern) and replace with a string.
Here's something to start you off:
$text = preg_replace('/\+?[0-9][0-9()\-\s+]{4,20}[0-9]/', '[blocked]', $text);
this looks for:
a plus symbol (optional), followed by a number, followed by between 4-20 numbers, brackets, dashes or spaces, followed by a number
and replaces with the string [blocked].
This catches all the obvious combinations I can think of:
012345 123123
+44 1234 123123
+44(0)123 123123
0123456789
Placename 123456 (although this one will leave 'Placename')
however it will also strip out any succession of 6+ numbers, which might not be desirable!
To do so you must use regular expressions as you may know.
I found this pattern that could be useful for your project:
<?php
preg_match("/(^(([\+]\d{1,3})?[ \.-]?[\(]?\d{3}[\)]?)?[ \.-]?\d{3}[ \.-]?\d{4}$)/", $yourText, $matches);
//matches variable will contain the array of matched strings
?>
More information about this pattern can be found here http://gskinner.com/RegExr/?2rirv where you can even test it online. It's a great tool to test regular expressions.
preg_match($pattern, $subject) will return 1 (true) if pattern is found in subject, and 0 (false) otherwise.
A pattern to match the example you give might be '/\d{3}-\d{3}\d{4}/'
However whatever you choose for your pattern will suffer from both false positives and false negatives.
You might also consider looking for words like mob, cell or tel next to the number.
The fill details of the php pattern matching can be found at http://www.php.net/manual/en/reference.pcre.pattern.syntax.php
Ian
p.s. It can't be done for bad words, as the people in Scunthorpe will tell you.
I think that use a too tight regular espression would lead to loose a great number of detections.
You should check for portions of 10 consecutive chatacters containing more than 5 digits.
So it is similar you will have an analisys routine queued to be called after any message insertion due to the computational weight.
After the 6 or more digits have been isolated replace them as you prefer, including other syblings digits.
Better in any case to preserve original data, so you can try and train your detection algorithm until it works the best way.
Then you can also study your user data to create more complex euristics, such like case insensitive numbers written as letters, mixed, dot separated, etc...
It's not about write the most perfect regex, is about approaching the problem statistically and dinamically.
And remember, after you take action, user will change their insertion habits as consequence, so stats will change and you will need to learn and update your euristics.

Categories