PHP Split String after specific occurances - php

I have the following string I'm trying to split into different variables based on specfic occurneces
Brodel8DARK HORSE COMICS
I'd like my end result to be
$user = Brodel
$index = 8
$publisher = DARK HORSE COMICS
I've tried playing around with some reg expressions but I'm a novice
This conditions will always be true
The user name will change (different number of Characters etc..)
The index will always be an integer but can grow to 3+ digits
The Publisher will always be in all caps
Thanks for any help

As long as the publisher doesn't start with a number, then this regex should work
/^([A-Za-z]+)(\d+)([A-Z\s]+)$/
It's 0+ number of characters followed by 0+ digits and finally 0+ capital letters.
<?php
$string = 'Brodel8DARK HORSE COMICS';
if(preg_match('/^([A-Za-z]+)(\d+)([A-Z\s]+)$/', $string, $matches) === 1){
var_dump($matches);
}
This outputs:
array(4) {
[0]=>
string(24) "Brodel8DARK HORSE COMICS"
[1]=>
string(6) "Brodel"
[2]=>
string(1) "8"
[3]=>
string(17) "DARK HORSE COMICS"
}

try this:
<?php
$string = 'Brodel8DARK HORSE COMICS';
preg_match("/^([^\d]+)(\d+)([A-Z\s]+)$/", $string, $match);
//print_r($match);
echo $publisher = $match[3];//DARK HORSE COMICS
?>

Related

Finding hashtags in Text

Yes, there are lots of hashtag regex available here but none is suiting my needs. And no one is actually able to solve the problem.
The Regex should consider the following hashtags as valid:
#validhashtag
#valid_hashtag
#validhashtag_with_space_before_or_after
#valid_hashtag_chars_öÖäÄüÜß
...and not valid shoulw be:
ipsum#notvalid //Not valid: Connected to Word
http://google.com/#results //Not valid: Same as above
#not-valid
#not!valid
Allowed Characters should be:
a-Z,0-9,öÖäÄüÜß,_
Max length should be 50 characters.
The main problem is the part where the hashtags is "connected" to another textpart. I don't know how to solve that problem.
This is what I attempted to do
/([\p{Pc}\p{N}\p{L}\p{Mn}]{1,50})/u
That one works pretty well but doesn't consider the "word#hashtag" - Problem.
I think your original expression is pretty great, we'd just modify that with:
^\s*#([\p{Pc}\p{N}\p{L}\p{Mn}]{1,50})$
Demo
Test
$re = '/^\s*#([\p{Pc}\p{N}\p{L}\p{Mn}]{1,50})$/um';
$str = '#validhashtag
#valid_hashtag
#validhashtag_with_space_before_or_after
#valid_hashtag_chars_öÖäÄüÜß
ipsum#notvalid //Not valid: Connected to Word
http://google.com/#results //Not valid: Same as above
#not-valid
#not!valid';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
Output
array(4) {
[0]=>
array(2) {
[0]=>
string(13) "#validhashtag"
[1]=>
string(12) "validhashtag"
}
[1]=>
array(2) {
[0]=>
string(14) "#valid_hashtag"
[1]=>
string(13) "valid_hashtag"
}
[2]=>
array(2) {
[0]=>
string(41) " #validhashtag_with_space_before_or_after"
[1]=>
string(39) "validhashtag_with_space_before_or_after"
}
[3]=>
array(2) {
[0]=>
string(35) "#valid_hashtag_chars_öÖäÄüÜß"
[1]=>
string(34) "valid_hashtag_chars_öÖäÄüÜß"
}
}
You may use either of the two below:
/(?<!\S)#\w+(?!\S)/u
/(?<!\S)#[\w\p{M}\p{Pc}]+(?!\S)/u
See the regex demo. If you want to restrict the word part length, keep your {1,50} quantifier - /(?<!\S)#\w{1,50}(?!\S)/u.
Also note: \w even with u modifier does not match the same chars that are are considered "word" in .NET, Java, Python re regex. You may decide to include other classes to fill the gap and use [\w\p{M}\p{Pc}]+ instead of just \w where \p{M} matches any diacritics and \p{Pc} matches any connector punctuation.
Details
(?<!\S) - a whitespace or start of string required right before
# - a # sign
\w+ - 1+ word chars (NOTE if you want to restrict its length from 1 to 50, replace + with {1,50}) (also, note that u modifier lets the PCRE engine to match any Unicode letters and digits with \w shorthand)
[\w\p{M}\p{Pc}] - matches 1+ word chars + all diacritics (\p{M}) and all connector punctuation (\p{Pc}, considered as word in .NET regex)
(?!\S) - a whitespace or end of string required right after.
PHP demo:
$s = "#validhashtag
#valid_hashtag
#validhashtag_with_space_before_or_after
#valid_hashtag_chars_öÖäÄüÜß
...and not valid shoulw be:
ipsum#notvalid //Not valid: Connected to Word
http://google.com/#results //Not valid: Same as above
#not-valid
#not!valid";
if (preg_match_all('~(?<!\S)#\w+(?!\S)~u', $s, $matches)) {
print_r($matches[0]);
}
Output:
Array
(
[0] => #validhashtag
[1] => #valid_hashtag
[2] => #validhashtag_with_space_before_or_after
[3] => #valid_hashtag_chars_öÖäÄüÜß
)

Php preg_match issue not working

I am trying to find a php preg_match that can match:
"2-20 to 2-25"
from this text:
user levels 2-20 to 2-25 not ready
I tried
preg_match("/([0-9]+) to ([0-9]+)/", $vars[1] , $matchesto);
but the result is:
"20 to 2"
Any help appreciated.
Your pattern is almost correct; just include the dashes and adjust the capture group:
([-0-9]+ to [-0-9]+)
Example:
https://regex101.com/r/eD6lQ2/1
Thats because [0-9]+ matches one or more numbers but won't match a hyphen (-).
Try this:
$pattern = '~([0-9]+-[0-9]+) to ([0-9]+-[0-9]+)~Ui';
preg_match($pattern, $vars[1] , $matchesto);
You can use "\d" to match the digits:
<?php
$str = 'user levels 2-20 to 2-25 not ready';
$matches = array();
preg_match('/(\d+-\d+) to (\d+-\d+)/', $str, $matches);
var_dump($matches);
Output:
array(3) {
[0]=>
string(12) "2-20 to 2-25"
[1]=>
string(4) "2-20"
[2]=>
string(4) "2-25"
}

PHP preg_split, split by same characters

I'm trying to split a string with preg_split. Here's an example of the string:
111235622411
I want the output to be like this:
$arr[0] = "111";
$arr[1] = "2";
$arr[2] = "3";
$arr[3] = "5";
$arr[4] = "6";
$arr[5] = "22";
$arr[6] = "4";
$arr[7] = "11";
So if there's the same characters one after the other, I want them in the same "chunk". I just can't come up with the regular expression I should use. I'm sorry if some of the terms are wrong, because it has been some time since I coded PHP before.
I would use preg_match_all():
$string = '111235622411';
preg_match_all('/(.)\1*/', $string, $matches);
var_dump($matches[0]);
\1 references the previously captured group (.) (any single character). This feature is called back referencing. The regex repeats the previously matched character - greedy * meaning it matches as much equal characters as possible, what was desired in the question.
Output:
array(8) {
[0]=>
string(3) "111"
[1]=>
string(1) "2"
[2]=>
string(1) "3"
[3]=>
string(1) "5"
[4]=>
string(1) "6"
[5]=>
string(2) "22"
[6]=>
string(1) "4"
[7]=>
string(2) "11"
}
You can use this regex:
(.)(?=\1)\1+|\d
And instead of splitting it, take the matches.
$matches = null;
$returnValue = preg_match_all('/(.)(?=\\1)\\1+|\\d/', '111235622411', $matches);
And the $matches[0] will contain what you want. As #hek2mgl has suggested, you can also use the simpler /(\d)\1*/
DEMO
Following, a simple solution that consists in executing a preg_match_all:
The regex in this case is:
(\d)\1*
Signification of the regex:
(\d): 1st Capturing group. \d match a digit [0-9].
\1 matches the same text as most recently matched by the 1st capturing group.
*: Quantifier between zero and unlimited times.
The php code would be:
$re = "/(\\d)\\1*/";
$str = "111235622411";
preg_match_all($re, $str, $matches);
print_r($matches[0]);
You can access for example the first matching group which is "111" like this: $matches[0][0], the second which is "2" like this $matches[0][1], and so on. Check here Demo to see a working example.
Hope it's useful!

Matching any amount of words regular expression

I'm trying to capture a line with n-number of words that follow a title sequence in PHP, but I cannot capture anything more than the first word. Here are the contents of the file that I am trying to match:
Name: test
Caption: test test test test
And here is the regular expression code and results...
preg_match_all('/([A-z]+:)\s*(\w+)[\r|\r\n|\n]*/', $contents, $array);
Results:
array(3) {
[0]=> array(2) {
[0]=> string(11) "Name: test "
[1]=> string(14) "Caption: test "
}
[1]=> array(2) {
[0]=> string(5) "Name:"
[1]=> string(8) "Caption:"
}
[2]=> array(2) {
[0]=> string(4) "test"
[1]=> string(4) "test"
}
}
Any help would be greatly appreciated.
Assuming that your input data always looks like your example (title segment, colon, words; all on a single line), this should do it:
preg_match_all('/([A-Za-z]+:)\s*(.*)/', $contents, $array);
This would result in $array[1] matching something like Name:, and then $array[2] would match the rest of the line (you may have to use trim() to strip any leading and/or trailing white space from $array[2]).
If you only want to capture "words" in the second part, I believe you could change the second capture group to something like:
preg_match_all('/([A-Za-z]+:)\s*([\w\s]+)/', $contents, $array);
Note also that you shouldn't use the [A-z] construct, since there are non-alphabetical characters in the ASCII table between the upper case letters and the lower case letters. See the ASCII Table for a character map.

Need Regexp help PHP

I have for example such string - "7-th Road" or "7th number some other words" or "Some word 8-th word".
I need to get the first occurrence of number and all other next symbols to first occurrence of space.
So for examples above i need such values "7-th", "7th", "8-th".
And then from these matches like "7-th" i need extract only numbers in other operations.
Thanks in advance!
Regex should be /(\d+)([^\d]+)\s/ and the numbers would resolve to $1 and the ending characters to $2
Sample Code:
$string = '7-th Road';
preg_match_all('/(\d+)([^\d]+)\s/', $string, $result, PREG_PATTERN_ORDER);
var_dump($result[1]);
array(1) {
[0]=> string(1) "7"
}
var_dump($result[2]);
array(1) {
[0]=> string(1) "-th"
}
Are you asking for something like this?
#(\d+)-?(?:st|nd|rd|th)#
Example
If you would like to get just nums from the text use it:
preg_match_all('/(\d+)[th|\-th]*?/','7-th", "7th", "8-th', $matches);
But if you would like to remove 'th' or other just do replacement:
preg_replace('/(\d+)[th|\-th]*?/','$1', 'some string')
Not sure about the last one...

Categories