Consider the following strings:
$strings = array(
"8.-10. stage",
"8. stage"
);
I would like to extract the first integer of each string, so it would return
8
8
I tried to filter out numbers with preg_replace but it returns all integers and I only want the first.
foreach($strings as $string)
{
echo preg_replace("/[^0-9]/", '',$string);
}
Any suggestions?
A convenient (although not record-breaking in performance) solution using regular expressions would be:
$string = "3rd time's a charm";
$filteredNumbers = array_filter(preg_split("/\D+/", $string));
$firstOccurence = reset($filteredNumbers);
echo $firstOccurence; // 3
Assuming that there is at least one number in the input, this is going to print the first one.
Non-digit characters will be completely ignored apart from the fact that they are considered to delimit numbers, which means that the first number can occur at any place inside the input (not necessarily at the beginning).
If you want to only consider a number that occurs at the beginning of the string, regex is not necessary:
echo substr($string, 0, strspn($string, "0123456789"));
preg_match('/\d+/',$id,$matches);
$id=$matches[0];
If the integer is always at the start of the string:
(int) $string;
If not, then strpbrk is useful for extracting the first non-negative number:
(int) strpbrk($string, "0123456789");
Alternatives
These one-liners are based on preg_split, preg_replace and preg_match:
preg_split("/\D+/", " $string")[1];
(int) preg_replace("/^\D+/", "", $string);
preg_match('/\d+/', "$string 0", $m)[0];
Two of these append extra character(s) to the string so empty strings or strings without numbers do not cause problems.
Note that these alternative solutions are for extracting non-negative integers only.
Try this:
$strings = array(
"8.-10. stage",
"8. stage"
);
$res = array();
foreach($strings as $key=>$string){
preg_match('/^(?P<number>\d)/',$string,$match);
$res[$key] = $match['number'];
}
echo "<pre>";
print_r($res);
foreach($strings as $string){
if(preg_match("/^(\d+?)/",$string,$res)) {
echo $res[1].PHP_EOL;
}
}
if you have Notice in PHP 7 +
Notice: Only variables should be passed by reference in YOUR_DIRECTORY_FILE.php on line LINE_NUMBER
By using this code
echo reset(array_filter(preg_split("/\D+/", $string)));
Change code to
$var = array_filter(preg_split("/\D+/", $string));
return reset($var);
And enjoy! Best Regards Ovasapov
How to filter out all characters except for the first occurring whole integer:
It is possible that the target integer is not at the start of the string (even if the OP's question only provides samples that start with an integer -- other researchers are likely to require more utility ...like the pages that I closed today using this page). It is also possible that the input contains no integers, or no leading / no trailing non-numeric characters.
The following is a regex expression has two checks:
It targets all non-numeric characters from the start of the string -- it stops immediately before the first encountered digit, if there is one at all.
It matches/consumes the first encountered whole integer, then immediatelly forgets/releases it (using \K) before matching/consuming ANY encountered characters in the remainder of the string.
My snippet will make 0, 1, or 2 replacements depending on the quality of the string.
Code: (Demo)
$strings = [
'stage', // expect empty string
'8.-10. stage', // expect 8
'8. stage', // expect 8
'8.-10. stage 1st', // expect 8
'Test 8. stage 2020', // expect 8
'Test 8.-10. stage - 2020 test', // expect 8
'A1B2C3D4D5E6F7G8', // expect 1
'1000', // expect 1000
'Test 2020', // expect 2020
];
var_export(
preg_replace('/^\D+|\d+\K.*/', '', $strings)
);
Or: (Demo)
preg_replace('/^\D*(\d+).*/', '$1', $strings)
Output:
array (
0 => '',
1 => '8',
2 => '8',
3 => '8',
4 => '8',
5 => '8',
6 => '1',
7 => '1000',
8 => '2020',
)
Related
This question already has answers here:
How to use preg_replace_callback?
(2 answers)
Closed 4 years ago.
I have a very lot of list in a text file something like below:
001.Porus.2017.S01E01.The.Epic.Story.Of.A.Warrior.720.x264.mp4
002.Porus.2017.S01E01.Welcome.With.A.Fight.720.x264.mp4
003.Porus.2017.S01E01.Anusuya.Stays.in.Poravs.720.x264.mp4
004.Porus.2017.S01E01.Olympia.Prays.For.A.Child.720.x264.mp4
.................
I want to replace all E01 in S01E01 with a number in a front of each list. Output I want :
001.Porus.2017.S01E001.The.Epic.Story.Of.A.Warrior.720.x264.mp4
002.Porus.2017.S01E002.Welcome.With.A.Fight.720.x264.mp4
003.Porus.2017.S01E003.Anusuya.Stays.in.Poravs.720.x264.mp4
004.Porus.2017.S01E004.Olympia.Prays.For.A.Child.720.x264.mp4
......................
Btw, I'm using the following codes;
$list = file("list.txt", FILE_SKIP_EMPTY_LINES | FILE_IGNORE_NEW_LINES);
$string = "";
foreach($list as $index => $entry)
{
$string .= str_pad($index + 1, 3, '0', STR_PAD_LEFT) . "." . $entry . ", ";
}
$string = substr($string, 0 , -2);
$get = explode(",", $string);
$phr = implode("<br>", array_values(array_unique($get)));
print_r($phr);
<pre>
<?php
$arr = [
'001.Porus.2017.S01E01.The.Epic.Story.Of.A.Warrior.720.x264.mp4',
'002.Porus.2017.S01E01.Welcome.With.A.Fight.720.x264.mp4',
'003.Porus.2017.S01E01.Anusuya.Stays.in.Poravs.720.x264.mp4',
'004.Porus.2017.S01E01.Olympia.Prays.For.A.Child.720.x264.mp4'
];
$your_next_number = "some_number";
$modified_values = [];
foreach($arr as $each_value){
$modified_values[] = str_replace("S01E01","S01".$your_next_number,$each_value);
//$your_next_number = something;some change that you want to make to your next iterable number.
}
print_r($modified_values);
OUTPUT
Array
(
[0] => 001.Porus.2017.S01some_number.The.Epic.Story.Of.A.Warrior.720.x264.mp4
[1] => 002.Porus.2017.S01some_number.Welcome.With.A.Fight.720.x264.mp4
[2] => 003.Porus.2017.S01some_number.Anusuya.Stays.in.Poravs.720.x264.mp4
[3] => 004.Porus.2017.S01some_number.Olympia.Prays.For.A.Child.720.x264.mp4
)
UPDATE
You can replace the code inside the foreach with the code below.
Credits to #ArtisticPhoenix for providing this improvisation and explanation of it.
foreach($arr as $each_value){
$modified_values[] = preg_replace('/^(\d+)([^S]+)(S01E)(\d+)/', '\1\2\3\1', $each_value);
}
^(\d+) => This is group 1. Capture (^ at the start) and digits (\d) one or
more (+).
([^S])+ => This is group 2. Capture anything but a capital S ([^S]) one or more (+).
(S01E)=> This is group 3. Capture S01E as it is.
(\d+) - This is group 4. Capture digits present after (S01E).
Note that group numbers have 1 based indexing since group 0 is the entire regex.
Replacement Part:
The replacement is \1\2\3\1.
The syntax \ followed by an integer(represents a group number) is known as backreferencing. This says that match the same set of characters you got
from matching that group.
So, put the 1st,2nd and 3rd capture back where
they came from, then the 4th capture is replaced with the 1st, this is the
initial digits replacing the last digits in the match.
So, let's take 001.Porus.2017.S01E01.The.Epic.Story.Of.A.Warrior.720.x264.mp4 as an example-
(\d+)([^S]+)(S01E)(\d+) matches this way => (001)(.Porus.2017.)(S01E)(01)
Hence, replacement makes it as (001)(.Porus.2017.)(S01E)(001)(notice the last change because of \1 at the end in the replacement \1\2\3\1. Rest of the string remains the same anyway.
I'm having a hard time understanding when strtr would be preferable to str_replace or vice versa. It seems that it's possible to achieve the exact same results using either function, although the order in which substrings are replaced is reversed. For example:
echo strtr('test string', 'st', 'XY')."\n";
echo strtr('test string', array( 's' => 'X', 't' => 'Y', 'st' => 'Z' ))."\n";
echo str_replace(array('s', 't', 'st'), array('X', 'Y', 'Z'), 'test string')."\n";
echo str_replace(array('st', 't', 's'), array('Z', 'Y', 'X'), 'test string');
This outputs
YeXY XYring
YeZ Zring
YeXY XYring
YeZ Zring
Aside from syntax, is there any benefit to using one over the other? Any cases where one would not be sufficient to achieve a desired result?
First difference:
An interesting example of a different behaviour between strtr and str_replace is in the comments section of the PHP Manual:
<?php
$arrFrom = array("1","2","3","B");
$arrTo = array("A","B","C","D");
$word = "ZBB2";
echo str_replace($arrFrom, $arrTo, $word);
?>
I would expect as result: "ZDDB"
However, this return: "ZDDD"
(Because B = D according to our array)
To make this work, use "strtr" instead:
<?php
$arr = array("1" => "A","2" => "B","3" => "C","B" => "D");
$word = "ZBB2";
echo strtr($word,$arr);
?>
This returns: "ZDDB"
This means that str_replace is a more global approach to replacements, while strtr simply translates the chars one by one.
Another difference:
Given the following code (taken from PHP String Replacement Speed Comparison):
<?php
$text = "PHP: Hypertext Preprocessor";
$text_strtr = strtr($text
, array("PHP" => "PHP: Hypertext Preprocessor"
, "PHP: Hypertext Preprocessor" => "PHP"));
$text_str_replace = str_replace(array("PHP", "PHP: Hypertext Preprocessor")
, array("PHP: Hypertext Preprocessor", "PHP")
, $text);
var_dump($text_strtr);
var_dump($text_str_replace);
?>
The resulting lines of text will be:
string(3) "PHP"
string(27) "PHP: Hypertext Preprocessor"
The main explanation:
This happens because:
strtr: it sorts its parameters by length, in descending order, so:
it will give "more importance" to the largest one, and then, as the subject text is itself the largest key of the replacement array, it gets translated.
because all the chars of the subject text have been replaced, the process ends there.
str_replace: it works in the order the keys are defined, so:
it finds the key “PHP” in the subject text and replaces it with: “PHP: Hypertext Preprocessor”, what gives as result:
“PHP: Hypertext Preprocessor: Hypertext Preprocessor”.
then it finds the next key: “PHP: Hypertext Preprocessor” in the resulting text of the former step, so it gets replaced by "PHP", which gives as result:
“PHP: Hypertext Preprocessor”.
there are no more keys to look for, so the replacement ends there.
It seems that it's possible to achieve the exact same results using either function
That's not always true and depends on the search and replace data you provide. For example where the two function differ see: Does PHP str_replace have a greater than 13 character limit?
strtr will not replace in parts of the string that already have been replaced - str_replace will replace inside replaces.
strtr will start with the longest key first in case you call it with two parameters - str_replace will replace from left to right.
str_replace can return the number of replacements done - strtr does not offer such a count value.
I think strtr provides more flexible and conditional replacement when used with two arguments, for example: if string is 1, replace with a, but if string is 10, replace with b. This trick could only be achieved by strtr.
$string = "1.10.0001";
echo strtr($string, array("1" => "a", "10" => "b"));
// a.b.000a
see : Php Manual Strtr.
Notice in manual
STRTR--
Description
string strtr ( string $str , string $from , string $to )
string strtr ( string $str , array $replace_pairs )
If given three arguments, this function returns a copy of str where ...
STR_REPLACE--
...
If search or replace are arrays, their elements are processed first to last.
...
STRTR each turn NOT effect to next, BUT STR_REPLACE does.
I have a string, something like this:
$str ="it is a test string.";
// for more clarification
i t i s a t e s t s t r i n g .
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Now I need to check all characters that are multiples of 4 (plus first character). like these:
1 => i
4 => i
8 => [space]
12 => t
16 => r
20 => .
Now, I need to compare them with Y (Y is a variable (symbol), for example Y = 'r' in here). So I want to replace Y with X (X is a variable (symbol) too, for example X = 'm' in here).
So, I want this output:
it is a test stming.
Here is my solution: I can do that using some PHP function:
strlen($str): to count the number of characters (named $sum)
$sum / 4: To find characters that are multiples of 4
substr($str, 4,1): to select specific character (named $char) {the problem is here}
if ($char == 'r') {}: to compare
str_replace('r','m',$char): to replace
And then combining all $char to each other.
But my solution has two problem:
substr() does not count [space] character (As I mentioned above)
combining characters is complicated a bit. (It needs to some waste processing)
Well, is there any solution? I like to do that using REGEX, Is it possible?
Could just use a simple regex with callback (add u flag if utf-8, s for . to match newline).
$str = preg_replace_callback(['/^./', '/.{3}\K./'], function ($m) {
return $m[0] == "r" ? "m" : $m[0];
}, $str); echo $str;
See this demo at tio.run > it is a test stming.
1st pattern: ^. any first character
2nd pattern: \K resets after .{3} any three characters, only want to check the fourth .
For use with anonymous function PHP >= 5.3 is required. Here is the workaround (demo).
Update: #Mariano demonstrated in his very nice answer that it is even with a single regex replacement possible. Thank you for the benchmark that reveals a rather bad performance for the preg_replace_callback solution. A more efficient variant without callback (but still two patterns).
$str = preg_replace(['/^r/', '/(?:...[^r])*...\Kr/'], 'm', $str);
I also included #revo's answer from 2017 in Mariano's benchmark and ran it on tio.run (100k loops). With newer PHP and PCRE2 the numbers seem to have changed slightly, "no regex" leads at tio.run.
In .NET or modern browser JS regex it also could be done like this by a variable length lookbehind.
If all characters in your string are in single byte, you can use something from PHP's official language reference:
$str ="it is a test string.";
$y="r";
$x="m";
$len=strlen($str);
if($str[0]==$y)
{
$str=substr_replace($str,$x,0,1);
}
if($len>=3)
{
for($i=3;$i<$len;$i+=4)
{
if($str[$i]==$y)
{
$str=substr_replace($str,$x,$i,1);
}
}
}
var_dump($str);
3v4l demo
Outputs it is a test stming.
Edit:
As #Don'tPanic points out, String is mutable using [] operator, so instead of using
$str=substr_replace($str,$x,$i,1);
you can just use
$str[$i]=$x;
This is an alternative using preg_replace()
$y = 'r';
$y = preg_quote($y, '/');
$x = 'M';
$x = preg_quote($x, '/');
$subject = 'rrrrrr rrrrr rrrrrr rrrr rrrr.';
$regex = "/\\G(?:^|(?(?<!^.).)..(?:.{4})*?)\\K$y/s";
$result = preg_replace($regex, $x, $subject);
echo $result;
// => MrrMrr MrrrM rrMrrr rrrM rrMr.
ideone demo
Regex:
\G(?:^|(?(?<!^.).)..(?:.{4})*?)\Km
\G is an assertion to the end of last match (or start of string)
(?:^|(?(?<!^.).)..(?:.{4})*?) matches:
^ start of string, to check at position 1
(?(?<!^.).) is an if clause that yields:
..(?:.{4})*?) 2 chars + a multiple of 4 if it has just replaced at position 1
...(?:.{4})*?) 3 chars + a multiple of 4 for successive matches
\K resets the text matched to avoid using backreferences
I must say though, regex is an overkill for this task. This code is counterintuitive and a typical regex that proves difficult to understand/debug/maintain.
EDIT. There was a later discussion about performance vs. code readability, so I did a benchmark to compare:
RegEx with a callback (#bobblebubble's answer).
RegEx with 2 replacements in an array (#bobblebubble's suggestion in comment).
No RegEx with substr_replace (#Passerby's answer).
Pure RegEx (this answer).
Result:
Code #1(with_callback): 0.548 secs/50k loops
Code #2(regex_array): 0.158 secs/50k loops
Code #3(no_regex): 0.120 secs/50k loops
Code #4(pure_regex): 0.118 secs/50k loops
Benchmark in ideone.com
Try this
$str ="it is a test string.";
$y="r";
$x="m";
$splite_array = str_split($str);
foreach ($splite_array as $key => $val)
{
if($key % 4 == 0 && $val == $y)
{
$splite_array[$key] = $x;
}
}
$yout_new_string = implode($splite_array);
This piece of code could help you on your way:
// Define variables
$string = "it is a test string.";
$y = 'r';
$x = 'm';
// Convert string to array
$chars = explode('', $string);
// Loop through all characters
foreach ($chars as $key => $char) {
// Array keys start at 0, so we add 1
$keyCount = $key+1;
// Check if deviding the key by 4 doesn't have rest value
// This means it is devisable by 4
if ($keyCount % 4 == 0 && $value == $y) {
$chars[$key] = $x;
}
}
// Convert back to string
$string = implode($chars);
Here is one other way to do this using string access and modification by character. (Consequently, it is only useful for single-byte encoded strings.)
// First character handled outside the loop because its index doesn't match the pattern
if ($str[0] == $y) $str[0] = $x;
// access every fourth character
for ($i=3; isset($str[$i]) ; $i+=4) {
// change it if it needs to be changed
if ($str[$i] == $y) $str[$i] = $x;
}
This modifies the original string rather than creating a new string, so if that shouldn't happen, it should be used on a copy.
Late to the party, puting aside \G anchor, I'd go with (*SKIP)(*F) method:
$str = "it is a test string.";
echo preg_replace(['~\Ar~', '~.{3}\K(?>r|.(*SKIP)(?!))~'], 'm', $str);
Short and clean.
PHP live demo
Sorry for the title, I don't know how to explain it better.
I must get 354607 from the following string:
...jLHoiAAD1037354607Ij0Ij1Ij2...
The "354607" is dynamic, but it has the "1037" in any case before, and is in any case exactly 6 characters long.
The problem is, the string is about 50.000 up to 1.000.000 characters long. So I want a resource-friendly solution.
I tried it with:
preg_match_all("/1037(.*?){0,5}/", $new, $search1037);
and:
preg_match_all("/1037(.*?{0,5})/", $new, $search1037);
but, I don't know how to use regular expressions correctly.
I hope someone could help me!
Thank's a lot!
Use, \d{6} represents 6 numbers
preg_match_all("/1037(\d{6})/", $new, $search1037);
returns an array with
array(
0 => array(
0 => 1037354607
),
1 => array(
0 => 354607
)
)
Check this demo
Since you're concerned with finding a resource-friendly solution, you may be better off not using preg_match. Regular expressions tend to require more overhead in general, as discussed in this SO question.
Instead, you could use strstr():
$string = strstr($string,'1037');
Which will return the first instance of '1037' in $string, along with everything following it. Then, use substr():
$string = substr($string,4,6);
Which returns the substring within $string starting at position 4 (where position 0 = 1, position 1 = 0, position 2 = 3, position 3 = 7, position 4 = beginning of 6 digits) and including 6 characters.
For fun, in one line:
$string = substr(strstr($string,'1037'),4,6);
I need to format a phone number as one long string of numbers (US Phone Number format)
// I know there are tons more
$phones = array(
'1(800) 555-1212',
'1.800.555.1212',
'800.555.1212',
'1 800 555 1212',
'1.800 CALL NOW' // 1 800 225-5669
);
foreach($phones as $phone) {
echo "new format: ".(preg_replace("/[^0-9]/", "", $phone)."<br />\n";
}
Now this should return something like this:
8005551212 (with or without the 1)
but how do I map/convert the number with CALL NOW to:
18002255669
To save some typing...
$phoneNumber = strtr($phoneLetters, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "22233344455566677778889999");
You could use strtr().
$number = strtr($number, array('A'=> '2', 'B' => '2', ... 'Z' => '9'));
Or actually, I think:
$number = strtr($number, "AB...Z", "22...9");
For the first step, you need to do a different regex replace (your version would now lose all the letters):
$result = preg_replace('/[^A-Z0-9]+/i', '', $phone);
Then, you need to take the string and replace each letter with its corresponding digit (see konforce's answer). That's not really a job for a regex.