PHP preg_match_all failing - php

I'm trying to extract ID from a possibly huge text, what did I miss?
preg_match_all('/(ID\s\d+)/', "ID 20380843, ID ​20675712", $matches);
print_r( $matches[0] );
Only return:
Array
(
[0] => ID 20380843
)
Instead of:
Array
(
[0] => ID 20380843
[1] => ID 20675712
)

Did you copy that string from your code? Because there is something sneaky happening.
When I copied the code to my editor, it gave me this for string:
"ID 20380843, ID ?20675712"
As you can see, there is a questionmark-sign in the 2nd, thus failing your expression :)

Your problem isn't preg_replace_all, it's your source file. There's an invisible unicode character in the second ID - you can see by copy/pasting it into this Unicode Converter, you'll see U+200B show up in various forms in the lower boxes:
Unicode U+hex notation
preg_match_all('/(ID\s\d+)/', "ID 20380843, ID U+200B^20675712", $matches);
(emphasis mine)
This is the Unicode Zero-Width Spaaace, which is apparently not included in \s as PHP's PREG defines it.

print_r(matches) instead of print_r(matches[0]);
try
preg_match_all('/(ID\s\d+)/', "ID 20380843, ID ​20675712", $matches);
print_r( $matches );

Related

Ending regex with a specific character on first appearance

I have several source codes that I'm applying preg_match_all on.
this is what I tried:
$lazy = file_get_contents("Some_Source_code.txt");
if(!preg_match("#method_(.*)\(int var0, int var1, int var2\)#", $lazy, $function_name))
die("nothing here");
preg_match_all("#method_".$function_name[1]."\(.*\){1}#", $lazy, $matches);
print_r($matches);
but the output comes like this:
Array
(
[0] => Array
(
[0] => method_2393(int var0, int var1, int var2)
[1] => method_2393(0, 0, 0)).equals(this.field_1351.getText().toString()))
)
)
ok, what I want is $matches[0][1]. But
How can I stop it once it detects the closing parentheses ' ) ' just like the first one.
I can process the line after I extract it, but how can I do it with regex?
I searched the answers of similar problems but they were too specific.
Modify the regex as
#method_".$function_name[1]."\([^)]*\){1}#
Where you got wrong
#method_".$function_name[1]."\(.*\){1}#
here you used \(.*\) where .* would match anything including the )
Changes made
\([^)]*\) here [^)]* it matches anything other than ) so that it ends with the first occurence of the )
You can also use a lazy matching using .*? instead of .* which is gready and consumes as much as characters as it can

How to check single byte katakana in a string

Iam working with Double byte japaneese character website, i need to check the user enter a single byte katakana.Site developed in php platform.
This is the preg match that i used for checking
'/[\x{3040}-\x{309F}]/u'
I'm not 100% sure if this the test string I use is legal $string. I'll remove the answer (or try to update it) if it works out different. As the string is manual input (escaped the backslash initially), instead of raw;
$string = "\\xe3\\x80\\x85"; // RAW input might still be '\xe3\x80\x85' here
$result = preg_match_all("/\\\\xe3\\\\x8[0-3]\\\\x[8-9a-b][0-9a-f]/u", $string, $matches);
echo $string;
echo '<pre>';
print_r($matches);
echo '</pre>';
This prints out;
\xe3\x80\x85
Array
(
[0] => Array
(
[0] => \xe3\x80\x85
)
)
Thus; 々

Regex - How to match one pattern at a time

I've this function that parses some content to retrieve homemade link tag and convert it to normal link tag.
Possible input:
<p>blabalblahhh <moolinkx pageid="121">text to click</moolinkx> blablabah</p>
Output :
<p>blabalblahhh text to click blablabah</p>
Here is my code:
$regex = '/\<moolinkx pageid="(.{1,})"\>(.{1,})\<\/moolinkx\>/';
preg_match_all( $regex, $string, $matches );
It works perfectly well if there is only one in the string. But as soon as there is a second one, it doesn't work.
Input:
<p>blabalblahhh <moolinkx pageid="121">text to click</moolinkx> blablabah.</p>
<p>Another <moolinkx pageid="128">text to clickclick</moolinkx> again blablablah.</p>
That's what I got when I print_r($matches):
Array
(
[0] => Array
(
[0] => <moolinkx pageid="121">text to click</moolinkx> blablabah.</p><p>Another <moolinkx pageid="128">text to clickclick</moolinkx>
)
[1] => Array
(
[0] => 121">text to click</moolinkx> blablabah.</p><p>Another <moolinkx pageid="128
)
[2] => Array
(
[0] => text to clickclick
)
)
I'm not at ease with regex, so it must be something very trivial... but I can't pinpoint what it is :(
Thank you very much in advance!
NB: This is my first post here, though I've been using this terrific Q&A for ages!
Use a negative Regex:
$regex = '/<moolinkx pageid="([^"]+)">([^<]+)<\/moolinkx>/';
Explained demo here: http://regex101.com/r/sI3wK5
You are using a greedy selector, which is recognising everything between the first openning tag and the last closing tag as the content between the tags. Change your regex to:
$regex = '/\<moolinkx pageid="(.+?)"\>(.+?)\<\/moolinkx\>/';
preg_match_all( $regex, $string, $matches );
Notice the .{1,} has changed to .+?. The + means one or more instances, and the ? tells the regex to select the fewest characters it can to fulfil the expression.

PHP regex hashtag and special chars

ok, not sure if stupid or just monday.
It's actually quite simple. I have a textbox, in which I enter Text. A word gets marked with a hash (#), which then gets saved to the DB as the hashtag for that sentence.
Now, my funciton looks like this:
public function getHashtag($text)
{
print_r($text);
preg_match_all('/(#\w+)/', $text, $hashTag);
print_r($hashTag);
die();
if (isset($hashTag[0][0])) {
$hashTag = $hashTag[0][0];
return $hashTag;
} else {
return '';
}
}
the print_r are just debug stuff.
All I want to achieve is to get the word with the hash. Works great, EXCEPT if someone enters a Word in french which has àèé or other characters in it.
The output then just stops at the first special char.
#dfsdfaàèé asda sda sd asd aArray ( [0] => Array ( [0] => #dfsdfa ) [1] => Array ( [0] => #dfsdfa ) )
any ideas? :D
Just use this expression /(#[^\s[:punct:]]+)/.
Reads as "A # plus at least one character that is not white-space or punctuation."
The [:punct:] is one of the POSIX character classes.

Get the separating punctuation in a string with PHP

I have a product that is sold to multiple customers, each customer has its own unique product code derived from the my original product code e.g
My code: 1245-65
Customer 1: 1245/65
Customer 2: 1245.65
My question: Is there any way to analyse such a string and find what is separating its integers? My goal is to have a settings page where a demo customer code would be entered then all product codes would be derived from that example code. I'm sure PHP can handle this!
EXTRA INFO:
Sorry, I haven't given enough information. There might be a situation where the separator is an alphabetical quantity e.g 1245ABC65. I hate updating a question like this when so many people have given valid answers :( my fault.
You can use a regular expression to find the separator.
$str = '1245/65';
preg_match("/\d+(.)\d+/", $str, $separator);
$separator = $separator[1];
You may want to look for non numeric characters using preg_match_all
preg_match_all('/[^0-9]/', '1245-95', $matches);
print_r($matches);
//Array ( [0] => Array ( [0] => - ) ) in the example
With the updated question, you have to write :
$str = '1245ABC65';
preg_match("/\d+([^0-9]+)\d+/", $str, $separator);
echo $separator = $separator[1];
or
preg_match_all('/[^0-9]+/', '1245ABC95', $matches);
print_r($matches);
//Array ( [0] => Array ( [0] => 'ABC' ) ) in the example
Use preg_split and regular expressions, to search for others characters than numbers.
$separador = preg_split ('/\d/', '1234/65', -1, PREG_SPLIT_NO_EMPTY)
$separador = $separador[0];

Categories