REGEXR help - how to extract a year from a string - php

I have a year listed in my string
$s = "Acquired by the University in 1988";
In practice, that could be anywhere in this single line string. How do I extract it using regexr? I tried \d and that didn't work, it just came up with an error.
Jason
I'm using preg_match in LAMP 5.2

You need a regex to match four digits, and these four digits must comprise a whole word (i.e. a string of 10 digits contains four digits but is not a year.) Thus, the regex needs to include word boundaries like so:
if (preg_match('/\b\d{4}\b/', $s, $matches)) {
$year = $matches[0];
}

Try this code:
<?php
$s = "Acquired by the University in 1988 year.";
$yr = preg_replace('/^[^\d]*(\d{4}).*$/', '\1', $s);
var_dump($yr);
?>
OUTPUT:
string(4) "1988"
However this regex works with an assumption that 4 digit number appears just once in the line.

Well, you could use \d{4}, but that will break if there's anything else in the string with four digits.
Edit:
The problem is that, other than the four numeric characters, there isn't really any other identifying information (as, according to your requirements, the number can be anywhere in the string), so based on what you've written, this is probably the best that you can do outside of range checking the returned value.
$str = "the year is 1988";
preg_match('/\d{4}/', $str, $matches);
var_dump($matches);

/(^|\s)(\d{4})(\s|$)/gm
Matches
Acquired by the University in 1988
The 1945 vintage was superb
1492 columbus sailed the ocean blue
Ignores
There were nearly 10000 people there!
Member ID 45678
Phone Number 951-555-2563
See it in action at http://refiddle.com/10k

preg_match('/(\d{4})/', $string, $matches);

For a basic year match, assuming only one year
$year = false;
if(preg_match("/\d{4}/", $string, $match)) {
$year = $match[0];
}
If you need to handle the posibility of multiple years in the same string
if(preg_match_all("/\d{4}/", $string, $matches, PREG_SET_ORDER)) {
foreach($matches as $match) {
$year = $match[0];
}
}

/(?<!\d)\d{4}(?!\d)/ will match only 4-digit numbers that do not have digits before or after them.
(?<!\d) and (?!\d) are look-behind and look-ahead (respectively) assertions that ensure that a \d does not occur before or after the main part of the RE.
It may in practice be more sensible to use \b instead of the assertions; this will ensure that the beginning and end of the year occur at a "word boundary". So then "1337hx0r" would be appropriately ignored.
If you are only for looking for years within the past century or so, you could use
/\b(19|20)\d{2}\b/

Also if your string is something like that :
$date = "20044Q";
You can use below code to extract year from any string.
preg_match('/(?:(?:19|20)[0-9]{2})/', $date, $matches);
echo $matches[0];

Related

Ignore character using regex (pcre) PHP

I am trying to capture date from a file file-2018-02-19-second.json.
I am using .*file-(..........).*.json regex to capture the date in the file .The regex is capturing 2018-02-19 date in the file but I want to ignore "-" in the file and only capture 20180219. How can I do it?
If your filenames have always the same format, you can convert your string to a DateTime instance using DateTime::createFromFormat:
$date = DateTime::createFromFormat('*-Y-m-d-*.*', 'file-2018-02-19-second.json');
echo $date->format('Ymd');
You can find the different format characters and their meanings in the php manual.
$fileName = 'file-2018-02-19-second.json';
preg_match('#([12]\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]))#is', $fileName,
$output);
if (!empty($output)) {
$date = preg_replace('#-#is', '', $output[1]);
echo $date;
}
hope can help you!
related link: https://www.regextester.com/96683
Option 1 - Match & Replace
See code in use here
<?php
$fn = 'file-2018-02-1-second.json';
$fn = preg_match('/.*file-\K\d{4}-\d{2}-\d{2}/', $fn, $o);
echo isset($o[0]) ? preg_replace('/\D+/', '', $o[0]) : 'No match found';
Option 2 - Group & Concatenate
See code in use here
<?php
$fn = 'file-2018-02-1-second.json';
$fn = preg_match('/.*file-(\d{4})-(\d{2})-(\d{2})/', $fn, $o);
echo isset($o[1], $o[2], $o[3]) ? $o[1].$o[2].$o[3] : 'No match found';
Explanation of Patterns
.*file-\K\d{4}-\d{2}-\d{2}
.* Match any character any number of times
file- Match this literally
\K Resets the starting point of the match. Any previously consumed characters are no longer included in the final match.
\d{4} Match any digit exactly 4 times
- Match this literally
\d{2} Match any digit exactly 2 times
- Match this literally
\d{2} Match any digit exactly 2 times
The second pattern \D+ simply matches any non-digit character one or more times for replacement.
The last pattern (from option 2) is really just the simplified version of the first pattern I described, but groups each number part into capture groups.
Result: 20180219
This question appears to be solely about data extract, not about data validation.
Code: (Demo)
$file = 'file-2018-02-19-second.json';
$date = preg_replace('~\D+~', '', $file);
echo $date;
Output:
20180219
If you need slightly more accuracy/validation than that (leveraging the location of file-, you can use \G to extract the date components before imploding them.
Code: (Demo) (Pattern Demo)
$file = 'somefile-2018-02-19-second.json';
echo preg_match_all('~\Gfile-\K\d+|-\K\d+~', $file, $out) ? implode($out[0]) : 'fail';
// same result as earlier method

Make two simple regex's into one

I am trying to make a regex that will look behind .txt and then behind the "-" and get the first digit .... in the example, it would be a 1.
$record_pattern = '/.txt.+/';
preg_match($record_pattern, $decklist, $record);
print_r($record);
.txt?n=chihoi%20%283-1%29
I want to write this as one expression but can only seem to do it as two. This is the first time working with regex's.
You can use this:
$record_pattern = '/\.txt.+-(\d)/';
Now, the first group contains what you want.
Your regex would be,
\.txt[^-]*-\K\d
You don't need for any groups. It just matches from the .txt and upto the literal -. Because of \K in our regex, it discards the previously matched characters. In our case it discards .txt?n=chihoi%20%283- string. Then it starts matching again the first digit which was just after to -
DEMO
Your PHP code would be,
<?php
$mystring = ".txt?n=chihoi%20%283-1%29";
$regex = '~\.txt[^-]*-\K\d~';
if (preg_match($regex, $mystring, $m)) {
$yourmatch = $m[0];
echo $yourmatch;
}
?> //=> 1

Extract last section of string

I have a string like this:
[numbers]firstword[numbers]mytargetstring
I would like to know how is it possible to extract "targetstring" taking account the following :
a.) Numbers are numerical digits for example, my complete string with numbers:
12firstword21mytargetstring
b.) Numbers can be any digits, for example above are two digits each, but it can be any number of digits like this:
123firstword21567mytargetstring
Regardless of the number of digits, I am only interested in extracting "mytargetstring".
By the way "firstword" is fixed and will not change with any combination.
I am not very good in Regex so I appreciate someone with strong background can suggest how to do this using PHP. Thank you so much.
This will do it (or should do)
$input = '12firstword21mytargetstring';
preg_match('/\d+\w+\d+(\w+)$/', $input, $matches);
echo $matches[1]; // mytargetstring
It breaks down as
\d+\w+\d+(\w+)$
\d+ - One or more numbers
\w+ - followed by 1 or more word characters
\d+ - followed by 1 or more numbers
(\w+)$ - followed by 1 or more word characters that end the string. The brackets mark this as a group you want to extract
preg_match("/[0-9]+[a-z]+[0-9]+([a-z]+)/i", $your_string, $matches);
print_r($matches);
You can do it with preg_match and pattern syntax.
$string ='2firstword21mytargetstring';
if (preg_match ('/\d(\D*)$/', $string, $match)){
// ^ -- end of string
// ^ -- 0 or more
// ^^ -- any non digit character
// ^^ -- any digit character
var_dump($match[1]);}
Try it like,
print_r(preg_split('/\d+/i', "12firstword21mytargetstring"));
echo '<br/>';
echo 'Final string is: '.end(preg_split('/\d+/i', "12firstword21mytargetstring"));
Tested on http://writecodeonline.com/php/
You don't need regex for that:
for ($i=strlen($string)-1; $i; $i--) {
if (is_numeric($string[$i])) break;
}
$extracted_string = substr($string, $i+1);
Above it's probably the faster implementation you can get, certainly faster than using regex, which you don't need for this simple case.
See the working demo
your simple solution is here :-
$keywords = preg_split("/[\d,]+/", "hypertext123language2434programming");
echo($keywords[2]);

Regular expression to match hyphenated words

How can I extract hyphenated strings from this string line?
ADW-CFS-WE CI SLA Def No SLANAME CI Max Outage Service
I just want to extract "ADW-CFS-WE" from it but has been very unsuccessful for the past few hours. I'm stuck with this simple regEx "(.*)" making the all of the string stated about selected.
You can probably use:
preg_match("/\w+(-\w+)+/", ...)
The \w+ will match any number of alphanumeric characters (= one word). And the second group ( ) is any additional number of hyphen with letters.
The trick with regular expressions is often specificity. Using .* will often match too much.
$input = "ADW-CFS-WE X-Y CI SLA Def No SLANAME CI Max Outage Service";
preg_match_all('/[A-Z]+-[A-Z-]+/', $input, $matches);
foreach ($matches[0] as $m) {
echo $matches . "\n";
}
Note that this solutions assumes that only uppercase A-Z can match. If that's not the case, insert the correct character class. For example, if you want to allow arbitrary letters (like a and Ä), replace [A-Z] with \p{L}.
Just catch every space free [^\s] words with at least an '-'.
The following expression will do it:
<?php
$z = "ADW-CFS-WE CI SLA Def No SLANAME CI Max Outage Service";
$r = preg_match('#([^\s]*-[^\s]*)#', $z, $matches);
var_dump($matches);
The following pattern assumes the data is at the beginning of the string, contains only capitalized letters and may contain a hyphen before each group of one or more of those letters:
<?php
$str = 'ADW-CFS-WE CI SLA Def No SLANAME CI Max Outage Service';
if (preg_match('/^(?:-?[A-Z]+)+/', $str, $matches) !== false)
var_dump($matches);
Result:
array(1) {
[0]=>
string(10) "ADW-CFS-WE"
}

Split alphanumeric string between leading digits and trailing letters

I have a string like:
$Order_num = "0982asdlkj";
How can I split that into the 2 variables, with the number as one element and then another variable with the letter element?
The number element can be any length from 1 to 4 say and the letter element fills the rest to make every order_num 10 characters long in total.
I have found the php explode function...but don't know how to make it in my case because the number of numbers is between 1 and 4 and the letters are random after that, so no way to split at a particular letter.
You can use preg_split using lookahead and lookbehind:
print_r(preg_split('#(?<=\d)(?=[a-z])#i', "0982asdlkj"));
prints
Array
(
[0] => 0982
[1] => asdlkj
)
This only works if the letter part really only contains letters and no digits.
Update:
Just to clarify what is going on here:
The regular expressions looks at every position and if a digit is before that position ((?<=\d)) and a letter after it ((?=[a-z])), then it matches and the string gets split at this position. The whole thing is case-insensitive (i).
Use preg_match() with a regular expression of (\d+)([a-zA-Z]+). If you want to limit the number of digits to 1-4 and letters to 6-9, change it to (\d+{1,4})([a-zA-Z]{6,9}).
preg_match("/(\\d+)([a-zA-Z]+)/", "0982asdlkj", $matches);
print("Integer component: " . $matches[1] . "\n");
print("Letter component: " . $matches[2] . "\n");
Outputs:
Integer component: 0982
Letter component: asdlkj
http://ideone.com/SKtKs
You can also do it using preg_split by splitting your input at the point which between the digits and the letters:
list($num,$alpha) = preg_split('/(?<=\d)(?=[a-z]+)/i',$Order_num);
You can use a regex for that.
preg_match('/(\d{1,4})([a-z]+)/i', $str, $matches);
array_shift($matches);
list($num, $alpha) = $matches;
Check this out
<?php
$Order_num = "0982asdlkj";
$split=split("[0-9]",$Order_num);
$alpha=$split[(sizeof($split))-1];
$number=explode($alpha, $Order_num);
echo "Alpha -".$alpha."<br>";
echo "Number-".$number[0];
?>
with regards
wazzy
My preferred approach would be sscanf() because it is concise, doesn't need regex, offers the ability to cast the numeric segment as integer type, and doesn't generate needless fullstring matches like preg_match(). %s does rely, though, on the fact that there will be no whitespaces in the letters segment of the string.
Demo
$Order_num = "0982asdlkj";
var_export (
sscanf($Order_num, '%d%s')
);
This can also be set up to declare individual variables.
sscanf($Order_num, '%d%s', $numbers, $letters)
If wanting to use a preg_ function, preg_split() is most appropriate, but I wouldn't use expensive lookarounds. Match the digits, then forget them (with \K). This will split the string without consuming any characters. Demo
var_export (
preg_split('/\d+\K/', $Order_num)
);
To assign variables, use "symmetric array destructuring".
[$numbers, $letters] = preg_split('/\d+\K/', $Order_num);
Beyond these single function approaches, there will be MANY two function approaches like:
$numbers = rtrim($Order_num, 'a..z');
$letters = ltrim($Order_num, '0..9');
But I wouldn't use them in a professional script because they lack elegance.

Categories