Stop regex splitting a matched url with preg_split - php

Given the following code:
$regex = '/(http\:\/\/|https\:\/\/)([a-z0-9-\.\/\?\=\+_]*)/i';
$text = preg_split($regex, $note, -1, PREG_SPLIT_DELIM_CAPTURE);
its returning an array such as:
array (size=4)
0 => string '...' (length=X)
1 => string 'https://' (length=8)
2 => string 'duckduckgo.com/?q=how+much+wood+could+a+wood-chuck+chuck+if+a+wood-chuck+could+chuck+wood' (length=89)
3 => string '...' (length=X)
I would prefer it if the returned array had size=3, with one single URL. Is this possible?

Sure that can be done, just remove those extra matching groups from your regex. Try following code:
$regex = '#(https?://[a-z0-9.?=+_-]*)#i';
$text = preg_split($regex, $note, -1, PREG_SPLIT_DELIM_CAPTURE);
Now resulting array will have 3 elements in the array instead of 4.
Besides removing extra grouping I have also simplified your regex also since most of the special characters don't need to be escaped inside character class.

Related

Split a string at comma character but ignore if said character is nested inside parentheses

I'm currently working as a php dev, and now has an assignment with some old php legacy code that's intended to filter certain car details before adding it into the DB.
What I'm currently stuck on is how I'm supposed to skip splitting the models inside of the parenthesis
Example:
"v70, 790, v50 (v40, v44), v22"
Expected output:
[ "v70", "790", "v50 (v40, v44)", "v22" ]
So that the , inside of the parentheses is disregarded by the split.
Any help and pointers is greatly appreciated!
You can use preg_split() method for this (documentation). You can use this to split the string based on a regex pattern for comma separated values but ignored if these are between parentheses.
This code works for your example:
<?php
$string = 'v70, 790, v50 (v40, v44), v22';
$pattern = '/,(?![^(]*\)) /';
$splitString = preg_split($pattern, $string);
Output of $splitString looks like:
array (size=4)
0 => string 'v70' (length=3)
1 => string '790' (length=3)
2 => string 'v50 (v40, v44)' (length=14)
3 => string 'v22' (length=3)

Get values from formatted, delimited string with quoted labels and values

I have an input string like this:
"Day":June 8-10-2012,"Location":US,"City":Newyork
I need to match 3 value substrings:
June 8-10-2012
US
Newyork
I don't need the labels.
Per my comment above, if this is JSON, you should definitely use those functions as they are more suited for this.
However, you can use the following REGEX.
/:([a-zA-Z0-9\s-]*)/g
<?php
preg_match('/:([a-zA-Z0-9\s-]*)/', '"Day":June 8-10-2012,"Location":US,"City":Newyork', $matches);
print_r($matches);
The regex demo is here:
https://regex101.com/r/BbwVQ5/1
Here are a couple of simple ways:
Code: (Demo)
$string = '"Day":June 8-10-2012,"Location":US,"City":Newyork';
var_export(preg_match_all('/:\K[^,]+/', $string, $out) ? $out[0] : 'fail');
echo "\n\n";
var_export(preg_split('/,?"[^"]+":/', $string, 0, PREG_SPLIT_NO_EMPTY));
Output:
array (
0 => 'June 8-10-2012',
1 => 'US',
2 => 'Newyork',
)
array (
0 => 'June 8-10-2012',
1 => 'US',
2 => 'Newyork',
)
Pattern #1 Demo \K restarts the match after : so that a positive lookbehind can be avoided (saving "steps" / improving pattern efficiency) By matching all following characters that are not a comma, a capture group can be avoided (saving "steps" / improving pattern efficiency).
Patter #2 Demo ,? makes the comma optional and qualifies the leading double-quoted "key" to be matched (split on). The targeted substring to split on will match the full "key" substring and end on the following : colon.

php preg_split ignore comma in specific string

I need some help. What I want is to make ignore a comma in specific string. It is a comma seperated file csv, but the name have a comma, and I need to ignore that.
What I got is
<?php
$pattern = '/([\\W,\\s]+Inc.])|[,]/';
$subject = 'hypertext language, programming, Amazon, Inc., 100';
$limit = -1;
$flags = PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE;
$result = preg_split ($pattern, $subject, $limit, $flags);
?>
Result is
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon',
3 => ' Inc.',
4 => ' 100',
);
?>
And I want the result to be
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon, Inc.',
3 => ' 100',
);
?>
Thanks for your help :)
Note that [\W,\s] = \W since \W matches any char that is not a letter, digit or underscore. However, it seems you just want to split on a , that is not followed with space(s)*+Inc..
You may use a negative lookahead to achieve this:
/,(?!\s*Inc\.)/
^^^^^^^^^^^^
See the regex demo
The (?!\s*Inc\.) will fail any , match if there are 0+ whitespaces (\s*) followed with a sequence of literal characters Inc. after them.
From your tutorial, if I pull the Amazon information as a CSV, I get the following format. Which you can then parse with one of Php's native functions. This shows you don't need to use explode or regex to handle this data. Use the right tool for the job:
<?php
$csv =<<<CSV
"amzn","Amazon.com, Inc.",765.56,"11/2/2016","4:00pm","-19.85 - -2.53%",10985
CSV;
$array = str_getcsv($csv);
var_dump($array);
Output:
array (size=7)
0 => string 'amzn' (length=4)
1 => string 'Amazon.com, Inc.' (length=16)
2 => string '765.56' (length=6)
3 => string '11/2/2016' (length=9)
4 => string '4:00pm' (length=6)
5 => string '-19.85 - -2.53%' (length=15)
6 => string '10985' (length=5)

how to find "http" in string from array?

In PHP I have an array like this:
array
0 => string 'open' (length=4)
1 => string 'http://www.google.com' (length=21)
2 => string 'blank' (length=5)
but it could also be like:
array
0 => string 'blank' (length=5)
1 => string 'open' (length=4)
2 => string 'http://www.google.com' (length=21)
now it is easy to find "blank" with in_array("blank", $array) but how can I see if one string is starting with "http"?
I've tried with
array_search('http', $array); // not working
array_search('http://www.google.com', $array); // is working
now everything after `http? could vary (how to write vary, varie? could be different is what I mean!)
Now do I need a regex or how can I check if http exists in array string?
Thanks for advices
"Welcome to PHP, there's a function for that."
Try preg_grep
preg_grep("/^http\b/i",$array);
Regex explained:
/^http\b/i
^\ / ^ `- Case insensitive match
| \/ `--- Boundary character
| `------ Literal match of http
`--------- Start of string
Try using the preg_grep function which returns an array of entries that match the pattern.
$array = array("open", "http://www.google.com", "blank");
$search = preg_grep('/http/', $array);
print_r($search);
Solution without regex:
$input = array('open', 'http://www.google.com', 'blank');
$output = array_filter($input, function($item){
return strpos($item, 'http') === 0;
});
Output:
array (size=1)
1 => string 'http://www.google.com' (length=21)
You can use preg_grep
$match = preg_grep("/http/",$array);
if(!empty($match)) echo "http exist in the array of string.";
or you can use foreach and preg_match
foreach($array as $check) {
if (preg_match("/http/", $check))
echo "http exist in the array of string.";
}

Remove first two words from a string

I have a string:
$string = "R 124 This is my message";
At times, the string may change, such as:
$string = "R 1345255 This is another message";
Using PHP, what's the best way to remove the first two "words" (e.g., the initial "R" and then the subsequent numbers)?
Thanks for the help!
$string = explode (' ', $string, 3);
$string = $string[2];
Must be much faster than regexes.
One way would be to explode the string in "words", using explode or preg_split (depending on the complexity of the words separators : are they always one space ? )
For instance :
$string = "R 124 This is my message";
$words = explode(' ', $string);
var_dump($words);
You'd get an array like this one :
array
0 => string 'R' (length=1)
1 => string '124' (length=3)
2 => string 'This' (length=4)
3 => string 'is' (length=2)
4 => string 'my' (length=2)
5 => string 'message' (length=7)
Then, with array_slice, you keep only the words you want (not the first two ones) :
$to_keep = array_slice($words, 2);
var_dump($to_keep);
Which gives :
array
0 => string 'This' (length=4)
1 => string 'is' (length=2)
2 => string 'my' (length=2)
3 => string 'message' (length=7)
And, finally, you put the pieces together :
$final_string = implode(' ', $to_keep);
var_dump($final_string);
Which gives...
string 'This is my message' (length=18)
And, if necessary, it allows you to do couple of manipulations on the words before joining them back together :-)
Actually, this is the reason why you might choose that solution, which is a bit longer that using only explode and/or preg_split ^^
try
$result = preg_replace('/^R \\d+ /', '', $string, 1);
or (if you want your spaces to be written in a more visible style)
$result = preg_replace('/^R\\x20\\d+\\x20/', '', $string, 1);
$string = preg_replace("/^\\w+\\s\\d+\\s(.*)/", '$1', $string);
$string = preg_replace('/^R\s+\d+\s*/', '', $string);

Categories