php preg_split ignore comma in specific string - php

I need some help. What I want is to make ignore a comma in specific string. It is a comma seperated file csv, but the name have a comma, and I need to ignore that.
What I got is
<?php
$pattern = '/([\\W,\\s]+Inc.])|[,]/';
$subject = 'hypertext language, programming, Amazon, Inc., 100';
$limit = -1;
$flags = PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE;
$result = preg_split ($pattern, $subject, $limit, $flags);
?>
Result is
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon',
3 => ' Inc.',
4 => ' 100',
);
?>
And I want the result to be
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon, Inc.',
3 => ' 100',
);
?>
Thanks for your help :)

Note that [\W,\s] = \W since \W matches any char that is not a letter, digit or underscore. However, it seems you just want to split on a , that is not followed with space(s)*+Inc..
You may use a negative lookahead to achieve this:
/,(?!\s*Inc\.)/
^^^^^^^^^^^^
See the regex demo
The (?!\s*Inc\.) will fail any , match if there are 0+ whitespaces (\s*) followed with a sequence of literal characters Inc. after them.

From your tutorial, if I pull the Amazon information as a CSV, I get the following format. Which you can then parse with one of Php's native functions. This shows you don't need to use explode or regex to handle this data. Use the right tool for the job:
<?php
$csv =<<<CSV
"amzn","Amazon.com, Inc.",765.56,"11/2/2016","4:00pm","-19.85 - -2.53%",10985
CSV;
$array = str_getcsv($csv);
var_dump($array);
Output:
array (size=7)
0 => string 'amzn' (length=4)
1 => string 'Amazon.com, Inc.' (length=16)
2 => string '765.56' (length=6)
3 => string '11/2/2016' (length=9)
4 => string '4:00pm' (length=6)
5 => string '-19.85 - -2.53%' (length=15)
6 => string '10985' (length=5)

Related

php preg_replace separate dots

I have a script which it gives me keywords from string. Code is:
<?php
$text = "This is some text. This is some text. Vending Machines are great.Баста - ЧК (Чистый Кайф)";
$string = preg_replace('/[^\p{L}\p{N}\s]/u', '', $text);
$string = preg_replace('/\s+/', ' ', $string);
$string = preg_replace('/\s+/', ' ', $string);
$string = mb_strtolower($string, 'UTF-8');
$keywords = explode(' ', $string);
var_dump($keywords);
?>
That's works great but I have a problem. This code returns me:
array (size=15)
0 => string 'this' (length=4)
1 => string 'is' (length=2)
2 => string 'some' (length=4)
3 => string 'text' (length=4)
4 => string 'this' (length=4)
5 => string 'is' (length=2)
6 => string 'some' (length=4)
7 => string 'text' (length=4)
8 => string 'vending' (length=7)
9 => string 'machines' (length=8)
10 => string 'are' (length=3)
11 => string 'greatбаста' (length=15)
12 => string 'чк' (length=4)
13 => string 'чистый' (length=12)
14 => string 'кайф' (length=8)
Why the 11th array is greatбаста. I want to separate great and баста words.
I need something which replaces . to dot and space (. ) if dot have something near.
Examples:
This is a good day.It is sunny => This is a good day. It is sunny (replaced . to dot and space (. ))
This is a good day. It is sunny => This is a good day. It is sunny nothing replaced. Because the dot have space after
The first replacement should be performed with a space, and the last input should be trimmed.
Use
$text = "This is some text. This is some text. Vending Machines are great.Баста - ЧК (Чистый Кайф)";
$string = preg_replace('/[^\p{L}\p{N}\s]/u', ' ', $text); // <= Replace with space
$string = preg_replace('/\s+/', ' ', $string);
$string = mb_strtolower($string, 'UTF-8');
$keywords = explode(' ', trim($string)); // <= Use trim to remove leading/trailing spaces
var_dump($keywords);
See the IDEONE demo
I also guess you do not need a duplicate $string = preg_replace('/\s+/', ' ', $string); line.
You only need 2 regexes.
Find: [^\p{L}\p{N}\s.]+
Replace: nothing
Find: [\s.]+
Replace: a space
Then do an explode.
Sort of direct and to the point !!

how to find "http" in string from array?

In PHP I have an array like this:
array
0 => string 'open' (length=4)
1 => string 'http://www.google.com' (length=21)
2 => string 'blank' (length=5)
but it could also be like:
array
0 => string 'blank' (length=5)
1 => string 'open' (length=4)
2 => string 'http://www.google.com' (length=21)
now it is easy to find "blank" with in_array("blank", $array) but how can I see if one string is starting with "http"?
I've tried with
array_search('http', $array); // not working
array_search('http://www.google.com', $array); // is working
now everything after `http? could vary (how to write vary, varie? could be different is what I mean!)
Now do I need a regex or how can I check if http exists in array string?
Thanks for advices
"Welcome to PHP, there's a function for that."
Try preg_grep
preg_grep("/^http\b/i",$array);
Regex explained:
/^http\b/i
^\ / ^ `- Case insensitive match
| \/ `--- Boundary character
| `------ Literal match of http
`--------- Start of string
Try using the preg_grep function which returns an array of entries that match the pattern.
$array = array("open", "http://www.google.com", "blank");
$search = preg_grep('/http/', $array);
print_r($search);
Solution without regex:
$input = array('open', 'http://www.google.com', 'blank');
$output = array_filter($input, function($item){
return strpos($item, 'http') === 0;
});
Output:
array (size=1)
1 => string 'http://www.google.com' (length=21)
You can use preg_grep
$match = preg_grep("/http/",$array);
if(!empty($match)) echo "http exist in the array of string.";
or you can use foreach and preg_match
foreach($array as $check) {
if (preg_match("/http/", $check))
echo "http exist in the array of string.";
}

Stop regex splitting a matched url with preg_split

Given the following code:
$regex = '/(http\:\/\/|https\:\/\/)([a-z0-9-\.\/\?\=\+_]*)/i';
$text = preg_split($regex, $note, -1, PREG_SPLIT_DELIM_CAPTURE);
its returning an array such as:
array (size=4)
0 => string '...' (length=X)
1 => string 'https://' (length=8)
2 => string 'duckduckgo.com/?q=how+much+wood+could+a+wood-chuck+chuck+if+a+wood-chuck+could+chuck+wood' (length=89)
3 => string '...' (length=X)
I would prefer it if the returned array had size=3, with one single URL. Is this possible?
Sure that can be done, just remove those extra matching groups from your regex. Try following code:
$regex = '#(https?://[a-z0-9.?=+_-]*)#i';
$text = preg_split($regex, $note, -1, PREG_SPLIT_DELIM_CAPTURE);
Now resulting array will have 3 elements in the array instead of 4.
Besides removing extra grouping I have also simplified your regex also since most of the special characters don't need to be escaped inside character class.

Remove first two words from a string

I have a string:
$string = "R 124 This is my message";
At times, the string may change, such as:
$string = "R 1345255 This is another message";
Using PHP, what's the best way to remove the first two "words" (e.g., the initial "R" and then the subsequent numbers)?
Thanks for the help!
$string = explode (' ', $string, 3);
$string = $string[2];
Must be much faster than regexes.
One way would be to explode the string in "words", using explode or preg_split (depending on the complexity of the words separators : are they always one space ? )
For instance :
$string = "R 124 This is my message";
$words = explode(' ', $string);
var_dump($words);
You'd get an array like this one :
array
0 => string 'R' (length=1)
1 => string '124' (length=3)
2 => string 'This' (length=4)
3 => string 'is' (length=2)
4 => string 'my' (length=2)
5 => string 'message' (length=7)
Then, with array_slice, you keep only the words you want (not the first two ones) :
$to_keep = array_slice($words, 2);
var_dump($to_keep);
Which gives :
array
0 => string 'This' (length=4)
1 => string 'is' (length=2)
2 => string 'my' (length=2)
3 => string 'message' (length=7)
And, finally, you put the pieces together :
$final_string = implode(' ', $to_keep);
var_dump($final_string);
Which gives...
string 'This is my message' (length=18)
And, if necessary, it allows you to do couple of manipulations on the words before joining them back together :-)
Actually, this is the reason why you might choose that solution, which is a bit longer that using only explode and/or preg_split ^^
try
$result = preg_replace('/^R \\d+ /', '', $string, 1);
or (if you want your spaces to be written in a more visible style)
$result = preg_replace('/^R\\x20\\d+\\x20/', '', $string, 1);
$string = preg_replace("/^\\w+\\s\\d+\\s(.*)/", '$1', $string);
$string = preg_replace('/^R\s+\d+\s*/', '', $string);

php regex to read select form

I have a source file with a select form with some options, like this:
<option value="TTO">1031</option><option value="187">187</option><option value="TWO">2SK8</option><option value="411">411</option><option value="AEL">Abec 11</option><option value="ABE">Abec11</option><option value="ACE">Ace</option><option value="ADD">Addikt</option><option value="AFF">Affiliate</option><option value="ALI">Alien Workshop</option><option value="ALG">Alligator</option><option value="ALM">Almost</option>
I would like to read this file using php and regex, but I don't really know how. Anybody an idea? It would be nice to have an array with the 3 digits code as a key, and the longer string as a value. (so, for example, $arr['TWO'] == '2SK8')
<?php
$options= '
<option value="TTO">1031</option><option value="187">187</option><option value="TWO">2SK8</option><option value="411">411</option><option value="AEL">Abec 11</option><option value="ABE">Abec11</option><option value="ACE">Ace</option><option value="ADD">Addikt</option><option value="AFF">Affiliate</option><option value="ALI">Alien Workshop</option><option value="ALG">Alligator</option><option value="ALM">Almost</option>
';
preg_match_all( '#(<option value="([^"]+)">([^<]+)<\/option>)#', $options, $arr);
$result = array();
foreach ($arr[0] as $i => $value)
{
$result[$arr[2][$i]] = $arr[3][$i];
}
print_r($result);
?>
output:
Array
(
[TTO] => 1031
[187] => 187
[TWO] => 2SK8
[411] => 411
[AEL] => Abec 11
[ABE] => Abec11
[ACE] => Ace
[ADD] => Addikt
[AFF] => Affiliate
[ALI] => Alien Workshop
[ALG] => Alligator
[ALM] => Almost
)
What about something like this :
$html = <<<HTML
<option value="TTO">1031</option><option value="187">187</option>
<option value="TWO">2SK8</option><option value="411">411</option>
<option value="AEL">Abec 11</option><option value="ABE">Abec11</option>
<option value="ACE">Ace</option><option value="ADD">Addikt</option>
<option value="AFF">Affiliate</option><option value="ALI">Alien Workshop</option>
<option value="ALG">Alligator</option><option value="ALM">Almost</option>
HTML;
$matches = array();
if (preg_match_all('#<option\s+value="([^"]+)">([^<]+)</option>#', $html, $matches)) {
$list = array();
$num_matches = count($matches[0]);
for ($i=0 ; $i<$num_matches ; $i++) {
$list[$matches[1][$i]] = $matches[2][$i];
}
var_dump($list);
}
The output ($list) would be :
array
'TTO' => string '1031' (length=4)
187 => string '187' (length=3)
'TWO' => string '2SK8' (length=4)
411 => string '411' (length=3)
'AEL' => string 'Abec 11' (length=7)
'ABE' => string 'Abec11' (length=6)
'ACE' => string 'Ace' (length=3)
'ADD' => string 'Addikt' (length=6)
'AFF' => string 'Affiliate' (length=9)
'ALI' => string 'Alien Workshop' (length=14)
'ALG' => string 'Alligator' (length=9)
'ALM' => string 'Almost' (length=6)
A few explainations :
I'm using preg_match_all to match as many times as possible
([^"]+) means "everything that is not a double-quote (as that one would mark the end of the value), at least one time, and as many times as possible (+)
([^<]+) means about the same thing, but with < instead of " as end marker
preg_match_all will get me an array containing in $matches[1] the list of all stuff that matched the first set of (), and in $matches[2] what matched the second set of ()
so I need to iterate over the results to re-construct the list that inetrestes you :-)
Hope this helps -- and that you understood what it does and how, so you can help yourself, the next time ;-)
As a sidenote : using regex to "parse" HTML is generally not such a good idea... If you have a full HTML page, you might want to take a look at DOMDocument::loadHTML.
If you don't and the format of the options is not well-defined... Well, maybe it might prove useful to add some stuff to the regex, as a precaution... (Like accepting spaces here and there, accepting other attributes, ...)
Try this out. Just load the file's contents into $raw_html and use this regex to collect the matches. The 3-digit code from the $ith option is $out[i][1], and the longer string is $out[i][2]. You can convert that to an associative array as needed.
$regex = '|<option value="(.{3})">([^<]+)</option>|';
preg_match_all($regex, $raw_html, $out, PREG_SET_ORDER);
print_r($out);

Categories