PHP preg_match_all strings starting with C and length of 4 - php

I'm attempting to search a text document with a PHP script and return all strings that start with the character "C" and have a length of 4. Some of the results will end in "=" but most will end in an alphanumeric character.
I was able to to successfully pull the ones that started with C and ended in =.
<?php
$str = file_get_contents('./FILENAME.txt', true);
preg_match_all('/C(.{2,3})=/', $str, $matches);
print_r($matches[0]);
foreach ($matches[0] as $sub)
{
$file = './CapturedData.txt';
file_put_contents($file, print_r("\"".$sub . "\"\n", true), FILE_APPEND);
}
?>
but when I tried to adjust it to pull all strings starting with C and ending in any alphanumeric character and being the length of 3/4, it just returns the first 3/4 characters out of ANY length strings.
I know I'm missing something simple, but its killing me. Anything I try just keeps returning the first x characters out of any length string. While I only want to return strings that have a length of 3/4, starting with "C" and ending in =,a-z,A-Z,0-9
EDIT:
Lets say these are the strings in the document:
blahblahblahCaa=
Cae=
CGGG
dontmatchthisCAAA
CAAAjkjkjk
XXXXXXCXXXX
I only want to return the 2nd and 3rd line

You must need to provide anchors to do an exact line match.
^C(?:\S{2,3})[a-zA-Z0-9=]$
DEMO
$input = <<<EOT
blahblahblahCaa=
Cae=
CGGG
dontmatchthisCAAA
CAAAjkjkjk
XXXXXXCXXXX
EOT;
preg_match_all("~^C(?:\S{2,3})[a-zA-Z0-9=]$~m", $input, $match);
print_r($match);
Output:
Array
(
[0] => Array
(
[0] => Cae=
[1] => CGGG
)
)
Regular Expression:
^ the beginning of the string
C 'C'
(?: group, but do not capture:
\S{2,3} non-whitespace (all but \n, \r, \t, \f,
and " ") (between 2 and 3 times)
) end of grouping
[a-zA-Z0-9=] any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '='
$ before an optional \n, and the end of the
string

the comment above alludes to it -- your regex does not specify that the string has to end at the ending character, just that it has to contain it. So Caaaa matches, but so does Caaaabbb and so would Caaa=bbb. You don't say what your input format is, but if it's one word per line, you can match /^C(..|...)[a-zA-Z0-9=]$/m

From what i understands,
\bC\S{2,3}=?(?=\s)
Example : http://regex101.com/r/lR6kI3/3
C matches a C
\S{2,3} matches anything other than a space. {2,3} quantifies the regex 2 or 3 times
=? optional =
(?=\s) checks if the string is followed by space
Example usage
$re = "/\\bC\\S{2,3}=?(?=\\s)/m";
$str = "blahblahblahCaa= Cae= CGGG dontmatchthisCAAA CAAAjkjkjk XXXXXXCXXXX ";
preg_match_all($re, $str, $matches);
print_r($matches);
Will give an output as
Array ( [0] => Cae= [1] => CGGG )

try this:
$str = file_get_contents('FILENAME.txt', true);
// $str = "blahblahblahCaa=\nCae=\nCGGG\ndontmatchthisCAAA\nCAAAjkjkjk\nXXXXXXCXXXX";
preg_match_all('/^C[a-zA-Z0-9=]{3}$/m', $str, $matches);
var_dump($matches[0]);
foreach ($matches[0] as $sub)
{
$file = 'CapturedData.txt';
file_put_contents($file, $sub, FILE_APPEND);
}
live demo

Related

Standardize/Sanitize variably-formatted phone numbers to be purely 10-digit strings

Before I store user-supplied phone numbers in my database, I need to standatdize/sanitize the string to consist of exactly 10 digits.
I want to end up with 1112223333 from all of these potential input values:
(111)222-3333
111-222-3333
111.222.3333
+11112223333
11112223333
In the last two strings, there's a 1 as the country code.
I was able to make some progress with:
preg_replace('/\D/', '', mysqli_real_escape_string($conn, $_POST["phone"]));
Can anyone help me to fix up the strings that have more than 10 digits?
Using your preg_replace which got all but the last one. Next you count the length of the string and remove the first number if it's over 9 numbers.
preg_replace('/\D/', '', mysqli_real_escape_string($conn, $_POST["phone"]));
if(strlen($str) > 9){
$str = substr($str, 1);
}
If you want to parse phone numbers, a very useful library is giggsey/libphonenumber-for-php. It is based on Google's libphonenumber, it has also a demo online to show how it works
Do it in two passes:
$phone = [
'(111)222-3333',
'111-222-3333',
'111.222.3333',
'+11112223333',
'11112223333',
'+331234567890',
];
# remove non digit
$res = preg_replace('/\D+/', '', $phone);
# keep only 10 digit
$res = preg_replace('/^\d+(\d{10})$/', '$1', $res);
print_r($res);
Output:
Array
(
[0] => 1112223333
[1] => 1112223333
[2] => 1112223333
[3] => 1112223333
[4] => 1112223333
[5] => 1234567890
)
This task can/should be accomplished by making just one pass over the string to replace unwanted characters.
.* #greedily match zero or more of any character
(\d{3}) #capture group 1
\D* #greedily match zero or more non-digits
(\d{3}) #capture group 2
\D* #greedily match zero or more non-digits
(\d{4}) #capture group 3
$ #match end of string
Matching the position of the end of the string ensures that the final 10 digits from the string are captured and any extra digits at the front of the string are ignored.
Code: (Demo)
$strings = [
'(111)222-3333',
'111-222-3333',
'111.222.3333',
'+11112223333',
'11112223333'
];
foreach ($strings as $string) {
echo preg_replace(
'/.*(\d{3})\D*(\d{3})\D*(\d{4})$/',
'$1$2$3',
$string
) . "\n---\n";
}
Output:
1112223333
---
1112223333
---
1112223333
---
1112223333
---
1112223333
---
The same result can be achieved by changing the third capture group to be a lookahead and only using two backreferences in the replacement string. (Demo)
echo preg_replace(
'/.*(\d{3})\D*(\d{3})\D*(?=\d{4}$)/',
'$1$2',
$string
);
Finally, a much simpler pattern can be used to purge all non-digits, but this alone will not trim the string down to 10 characters. Calling substr() with a starting offset of -10 will ensure that the last 10 digits are preserved. (Demo)
echo substr(preg_replace('/\D+/', '', $string), -10);
As a side note, you should use a prepared statement to interact with your database instead of relying on escaping which may have vulnerabilities.
Use str_replace with an array of the characters you want to remove.
$str = "(111)222-3333 111-222-3333 111.222.3333 +11112223333";
echo str_replace(["(", ")", "-", "+", "."], "", $str);
https://3v4l.org/80AWc

Multiple Hash Tags removal

function getHashTagsFromString($str){
$matches = array();
$hashTag=array();
if (preg_match_all('/#([^\s]+)/', $str, $matches)) {
for($i=0;$i<sizeof($matches[1]);$i++){
$hashtag[$i]=$matches[1][$i];
}
return $hashtag;
}
}
test string $str = "STR
this is a string
with a #tag and
another #hello #hello2 ##hello3 one
STR";
using above function i am getting answers but not able to remove two # tags from ##hello3 how to remove that using single regular expression
Update your regular expression as follows:
/#+(\S+)/
Explanation:
/ - starting delimiter
#+ - match the literal # character one or more times
(\S+) - match (and capture) any non-space character (shorthand for [^\s])
/ - ending delimiter
Regex101 Demo
The output will be as follows:
Array
(
[0] => tag
[1] => hello
[2] => hello2
[3] => hello3
)
Demo
EDIT: To match all the hash tags use:
preg_match_all('/#\S+/', $str, $match);
To remove, instead of preg_match_all you should use preg_replace for replacement.
$repl = preg_replace('/#\S+/', '', $str);

Split string on non-alphanumeric characters and on positions between digits and non-digits

I'm trying to split a string by non-alphanumeric delimiting characters AND between alternations of digits and non-digits. The end result should be a flat array of consisting of alphabetic strings and numeric strings.
I'm working in PHP, and would like to use REGEX.
Examples:
ES-3810/24MX should become ['ES', '3810', '24', 'MX']
CISCO1538M should become ['CISCO' , '1538', 'M']
The input file sequence can be indifferently DIGITS or ALPHA.
The separators can be non-ALPHA and non-DIGIT chars, as well as a change between a DIGIT sequence to an APLHA sequence, and vice versa.
The command to match all occurrances of a regex is preg_match_all() which outputs a multidimensional array of results. The regex is very simple... any digit ([0-9]) one or more times (+) or (|) any letter ([A-z]) one or more times (+). Note the capital A and lowercase z to include all upper and lowercase letters.
The textarea and php tags are inluded for convenience, so you can drop into your php file and see the results.
<textarea style="width:400px; height:400px;">
<?php
foreach( array(
"ES-3810/24MX",
"CISCO1538M",
"123ABC-ThatsHowEasy"
) as $string ){
// get all matches into an array
preg_match_all("/[0-9]+|[[:upper:][:lower:]]+/",$string,$matches);
// it is the 0th match that you are interested in...
print_r( $matches[0] );
}
?>
</textarea>
Which outputs in the textarea:
Array
(
[0] => ES
[1] => 3810
[2] => 24
[3] => MX
)
Array
(
[0] => CISCO
[1] => 1538
[2] => M
)
Array
(
[0] => 123
[1] => ABC
[2] => ThatsHowEasy
)
$str = "ES-3810/24MX35 123 TEST 34/TEST";
$str = preg_replace(array("#[^A-Z0-9]+#i","#\s+#","#([A-Z])([0-9])#i","#([0-9])([A-Z])#i"),array(" "," ","$1 $2","$1 $2"),$str);
echo $str;
$data = explode(" ",$str);
print_r($data);
I could not think on a more 'cleaner' way.
The most direct preg_ function to produce the desired flat output array is preg_split().
Because it doesn't matter what combination of alphanumeric characters are on either side of a sequence of non-alphanumeric characters, you can greedily split on non-alphanumeric substrings without "looking around".
After that preliminary obstacle is dealt with, then split on the zero-length positions between a digit and a non-digit OR between a non-digit and a digit.
/ #starting delimiter
[^a-z\d]+ #match one or more non-alphanumeric characters
| #OR
\d\K(?=\D) #match a number, then forget it, then lookahead for a non-number
| #OR
\D\K(?=\d) #match a non-number, then forget it, then lookahead for a number
/ #ending delimiter
i #case-insensitive flag
Code: (Demo)
var_export(
preg_split('/[^a-z\d]+|\d\K(?=\D)|\D\K(?=\d)/i', $string, 0, PREG_SPLIT_NO_EMPTY)
);
preg_match_all() isn't a silly technique, but it doesn't return the array, it returns the number of matches and generates a reference variable containing a two dimensional array of which the first element needs to be accessed. Admittedly, the pattern is shorter and easier to follow. (Demo)
var_export(
preg_match_all('/[a-z]+|\d+/i', $string, $m) ? $m[0] : []
);

How do i break string into words at the position of number

I have some string data with alphanumeric value. like us01name, phc01name and other i.e alphabates + number + alphabates.
i would like to get first alphabates + number in first string and remaining on second.
How can i do it in php?
You can use a regular expression:
// if statement checks there's at least one match
if(preg_match('/([A-z]+[0-9]+)([A-z]+)/', $string, $matches) > 0){
$firstbit = $matches[1];
$nextbit = $matches[2];
}
Just to break the regular expression down into parts so you know what each bit does:
( Begin group 1
[A-z]+ As many alphabet characters as there are (case agnostic)
[0-9]+ As many numbers as there are
) End group 1
( Begin group 2
[A-z]+ As many alphabet characters as there are (case agnostic)
) End group 2
Try this code:
preg_match('~([^\d]+\d+)(.*)~', "us01name", $m);
var_dump($m[1]); // 1st string + number
var_dump($m[2]); // 2nd string
OUTPUT
string(4) "us01"
string(4) "name"
Even this more restrictive regex will also work for you:
preg_match('~([A-Z]+\d+)([A-Z]+)~i', "us01name", $m);
You could use preg_split on the digits with the pattern capture flag. It returns all pieces, so you'd have to put them back together. However, in my opinion is more intuitive and flexible than a complete pattern regex. Plus, preg_split() is underused :)
Code:
$str = 'user01jason';
$pieces = preg_split('/(\d+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($pieces);
Output:
Array
(
[0] => user
[1] => 01
[2] => jason
)

Regex Help with manipulating string

i am seriously struggling to get my head around regex.
I have a sring with "iPhone: 52.973053,-0.021447"
i want to extract the two numbers after the colon into two seperate strings so delimited by the comma.
Can anyone help me? Cheers
Try:
preg_match_all('/\w+:\s*(-?\d+\.\d+),(-?\d+\.\d+)/',
"iPhone: 52.973053,-0.021447 FOO: -1.0,-1.0",
$matches, PREG_SET_ORDER);
print_r($matches);
which produces:
Array
(
[0] => Array
(
[0] => iPhone: 52.973053,-0.021447
[1] => 52.973053
[2] => -0.021447
)
[1] => Array
(
[0] => FOO: -1.0,-1.0
[1] => -1.0
[2] => -1.0
)
)
Or just:
preg_match('/\w+:\s*(-?\d+\.\d+),(-?\d+\.\d+)/',
"iPhone: 52.973053,-0.021447",
$match);
print_r($match);
if the string only contains one coordinate.
A small explanation:
\w+ # match a word character: [a-zA-Z_0-9] and repeat it one or more times
: # match the character ':'
\s* # match a whitespace character: [ \t\n\x0B\f\r] and repeat it zero or more times
( # start capture group 1
-? # match the character '-' and match it once or none at all
\d+ # match a digit: [0-9] and repeat it one or more times
\. # match the character '.'
\d+ # match a digit: [0-9] and repeat it one or more times
) # end capture group 1
, # match the character ','
( # start capture group 2
-? # match the character '-' and match it once or none at all
\d+ # match a digit: [0-9] and repeat it one or more times
\. # match the character '.'
\d+ # match a digit: [0-9] and repeat it one or more times
) # end capture group 2
A solution without using regular expressions, using explode() and stripos() :) :
$string = "iPhone: 52.973053,-0.021447";
$coordinates = explode(',', $string);
// $coordinates[0] = "iPhone: 52.973053"
// $coordinates[1] = "-0.021447"
$coordinates[0] = trim(substr($coordinates[0], stripos($coordinates[0], ':') +1));
Assuming that the string always contains a colon.
Or if the identifier before the colon only contains characters (not numbers) you can do also this:
$string = "iPhone: 52.973053,-0.021447";
$string = trim($string, "a..zA..Z: ");
//$string = "52.973053,-0.021447"
$coordinates = explode(',', $string);
Try:
$string = "iPhone: 52.973053,-0.021447";
preg_match_all( "/-?\d+\.\d+/", $string, $result );
print_r( $result );
I like #Felix's non-regex solution, I think his solution for the problem is more clear and readable than using a regex.
Don't forget that you can use constants/variables to change the splitting by comma or colon if the original string format is changed.
Something like
define('COORDINATE_SEPARATOR',',');
define('DEVICE_AND_COORDINATES_SEPARATOR',':');
$str="iPhone: 52.973053,-0.021447";
$s = array_filter(preg_split("/[a-zA-Z:,]/",$str) );
print_r($s);
An even more simple solution is to use preg_split() with a much more simple regex, e.g.
$str = 'iPhone: 52.973053,-0.021447';
$parts = preg_split('/[ ,]/', $str);
print_r($parts);
which will give you
Array
(
[0] => iPhone:
[1] => 52.973053
[2] => -0.021447
)

Categories