php preg_replace separate dots - php

I have a script which it gives me keywords from string. Code is:
<?php
$text = "This is some text. This is some text. Vending Machines are great.Баста - ЧК (Чистый Кайф)";
$string = preg_replace('/[^\p{L}\p{N}\s]/u', '', $text);
$string = preg_replace('/\s+/', ' ', $string);
$string = preg_replace('/\s+/', ' ', $string);
$string = mb_strtolower($string, 'UTF-8');
$keywords = explode(' ', $string);
var_dump($keywords);
?>
That's works great but I have a problem. This code returns me:
array (size=15)
0 => string 'this' (length=4)
1 => string 'is' (length=2)
2 => string 'some' (length=4)
3 => string 'text' (length=4)
4 => string 'this' (length=4)
5 => string 'is' (length=2)
6 => string 'some' (length=4)
7 => string 'text' (length=4)
8 => string 'vending' (length=7)
9 => string 'machines' (length=8)
10 => string 'are' (length=3)
11 => string 'greatбаста' (length=15)
12 => string 'чк' (length=4)
13 => string 'чистый' (length=12)
14 => string 'кайф' (length=8)
Why the 11th array is greatбаста. I want to separate great and баста words.
I need something which replaces . to dot and space (. ) if dot have something near.
Examples:
This is a good day.It is sunny => This is a good day. It is sunny (replaced . to dot and space (. ))
This is a good day. It is sunny => This is a good day. It is sunny nothing replaced. Because the dot have space after

The first replacement should be performed with a space, and the last input should be trimmed.
Use
$text = "This is some text. This is some text. Vending Machines are great.Баста - ЧК (Чистый Кайф)";
$string = preg_replace('/[^\p{L}\p{N}\s]/u', ' ', $text); // <= Replace with space
$string = preg_replace('/\s+/', ' ', $string);
$string = mb_strtolower($string, 'UTF-8');
$keywords = explode(' ', trim($string)); // <= Use trim to remove leading/trailing spaces
var_dump($keywords);
See the IDEONE demo
I also guess you do not need a duplicate $string = preg_replace('/\s+/', ' ', $string); line.

You only need 2 regexes.
Find: [^\p{L}\p{N}\s.]+
Replace: nothing
Find: [\s.]+
Replace: a space
Then do an explode.
Sort of direct and to the point !!

Related

php preg_split ignore comma in specific string

I need some help. What I want is to make ignore a comma in specific string. It is a comma seperated file csv, but the name have a comma, and I need to ignore that.
What I got is
<?php
$pattern = '/([\\W,\\s]+Inc.])|[,]/';
$subject = 'hypertext language, programming, Amazon, Inc., 100';
$limit = -1;
$flags = PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE;
$result = preg_split ($pattern, $subject, $limit, $flags);
?>
Result is
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon',
3 => ' Inc.',
4 => ' 100',
);
?>
And I want the result to be
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon, Inc.',
3 => ' 100',
);
?>
Thanks for your help :)
Note that [\W,\s] = \W since \W matches any char that is not a letter, digit or underscore. However, it seems you just want to split on a , that is not followed with space(s)*+Inc..
You may use a negative lookahead to achieve this:
/,(?!\s*Inc\.)/
^^^^^^^^^^^^
See the regex demo
The (?!\s*Inc\.) will fail any , match if there are 0+ whitespaces (\s*) followed with a sequence of literal characters Inc. after them.
From your tutorial, if I pull the Amazon information as a CSV, I get the following format. Which you can then parse with one of Php's native functions. This shows you don't need to use explode or regex to handle this data. Use the right tool for the job:
<?php
$csv =<<<CSV
"amzn","Amazon.com, Inc.",765.56,"11/2/2016","4:00pm","-19.85 - -2.53%",10985
CSV;
$array = str_getcsv($csv);
var_dump($array);
Output:
array (size=7)
0 => string 'amzn' (length=4)
1 => string 'Amazon.com, Inc.' (length=16)
2 => string '765.56' (length=6)
3 => string '11/2/2016' (length=9)
4 => string '4:00pm' (length=6)
5 => string '-19.85 - -2.53%' (length=15)
6 => string '10985' (length=5)

Getting numbers from string line? PHP

I have a function:
Function returns numbers from string line.
function get_numerics ($str) {
preg_match_all('/\d+/', $str, $matches);
return $matches[0];
}
And I need to get numbers to an array in my php file.
How to do that?
$counter = $user_count[$sk]; //$user_count[$sk] gives me a string line
//$user_count[$sk] is "15,16,18,19,18,17" - And i need those numbers seperated to an array
$skarray[] = get_numerics($counter); //Something is wrong?
Explode could work, but $user_count[$sk] line could be "15, 16, 19, 14,16"; ie it may or may not contain spaces.
You don't need regex for this, explode() combined with str_replace() will do it:-
$user_count = "15 ,16,18 ,19,18, 17";
$numbers = explode(',', str_replace(' ', '', $user_count));
var_dump($numbers);
Output:-
array (size=6)
0 => string '15' (length=2)
1 => string '16' (length=2)
2 => string '18' (length=2)
3 => string '19' (length=2)
4 => string '18' (length=2)
5 => string '17' (length=2)
if you have a string that looks like:
$str = "15,16,17,18,19";
And want to split them into an array, you can use explode
$arr = explode(",", $str);
see http://www.php.net/manual/en/function.explode.php

Convert recursive tags into array with regular expression

I have the following text
hello <?tag?> world <?tag2?> xx <?/tag2?> hello <?/tag?> world
And I need it converted into
array(
'hello ',
array(
' world ',
array(
' xx '
),
' hello '
),
' world'
);
Tags are alpha-numeric, as long as they are closed with the matching tag, or <?/?>. Tags with same name may repeat, but wouldn't be inside each-other.
My question is which would be the most CPU-efficient way to go?
use recursive preg_replace with callback
use preg_match_all with PREG_OFFSET_CAPTURE
use preg_split to flattern all tags (PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE), into linear array then walk through and group tags.
If you can also provide the expression, I would be really happy.
This turned out not so straightforward but hopefully this could be helpful to others. The biggest complication was returning non-string from callback function of preg_replace.
Thanks all who tried to help!
class Parser {
public $ret=array();
function loadTemplateFromString($str){
$this->parsed_template=$this->tags=array();
if(!$str){
return;
}
var_dump($str);
/* First expand self-closing tags <?$tag?> -> <?tag?><?/tag?> */
$str=preg_replace('/<\?\$([\w]+)\?>/','<?\1?><?/\1?>',$str);
/* Next fix short ending tag <?tag?> <?/?> -> <?tag?> <?/?> */
$x=preg_replace_callback('/.*?<\?\/\?>/',function($x){
return preg_replace('/(.*<\?([^\/][\w]+)\?>)(.*?)(<\?\/?\?>)/',
'\1\3<?/\2?>',$x[0]);
},$str);
/* Finally recursively build tag structure */
$this->recursiveReplace($x);
}
function recursiveReplace($x){
if(is_array($x)){
// Called recursively
$tmp2=$this->ret;$this->ret=array();
}else{
$x=array(4=>$x);
$tmp2=null;
}
$y=preg_replace_callback('/(.*?)(<\?([^\/$][\w]+)\?>)(.*?)(<\?\/(\3)?\?>)(.*?)/',
array($this,'recursiveReplace'),$x[4]);
$this->ret[]=$y;
if($tmp2===null)return;
$tmp=$this->ret;
$this->ret=$tmp2;
$this->ret[]=$x[1];
$this->ret[]=$tmp;
return '';
}
}
$p=new Parser();
$p->loadTemplateFromString('bla <?name?> name <?/name?> bla bla <?$surname?> bla '.
'<?middle?> mm <?/?> blah <?outer?> you <?inner?> are <?/?> inside <?/outer?>'.
' bobobo');
var_dump($p->ret);
This outputs:
array
0 => string 'bla ' (length=4)
1 =>
array
0 => string ' name ' (length=6)
2 => string ' bla bla ' (length=9)
3 =>
array
0 => string '' (length=0)
4 => string ' bla ' (length=5)
5 =>
array
0 => string ' mm ' (length=4)
6 => string ' blah ' (length=6)
7 =>
array
0 => string ' you ' (length=5)
1 =>
array
0 => string ' are ' (length=5)
2 => string ' inside ' (length=8)
8 => string ' bobobo' (length=7)
How about converting <?tagN?> to <elemN> and the parsing it as XML?
After you get a raw structure looking like the result you mentioned, you could/would verify it against your element structure (that is, ensure items are numerically inside each other etc).
Just add in a document element and you are set with this stylesheet:
Edit: After the fact that these tags are mixed with HTML came up, I thought I'd change my strategy. Please check out the following code first before a description:
$data = '<b>H</b>ello <?tag?> <b>W</b>orld <?/tag?>';
$conv1 = array(
// original => entity
'<?tag' => '%START-BEGIN%',
'<?/tag' => '%START-END%'
'?>' => '%END-END%'
);
$conv2 = array(
// entity => xml
'%START-BEGIN%' => '<element',
'%START-END%' => '</element'
'%END-END%' => '>'
);
$data = str_replace(array_keys($conv1), array_values($conv1), data);
$data = htmlentities($data, ENT_QUOTES); // encode HTML characters
$data = str_replace(array_values($conv2), array_keys($conv2), data);
$xml = '<?xml version="1.0" encoding="UTF-8"?>'.$data;
// You must apply the following function to each output text
// html_entity_decode($data,ENT_QUOTES);

PHP - Normalize user input array

If have an array like this:
array
0 => string '62 52, 53' (length=9)
1 => string '54' (length=2)
It's from user input, and you never know how/what they enter ;)
What I want in the end is this:
array
0 => string '62' (length=2)
1 => string '52' (length=2)
2 => string '53' (length=2)
3 => string '54' (length=2)
Here's how I do it:
$string = implode(',', $array);
$string = str_replace(', ', ',', $string);
$string = str_replace(' ', ',', $string);
$array = explode(',', $string);
Seems really clunky. Is there a more elegant way? One that maybe has better performance?
On each string:
preg_match_all("/[ ,]*(\d+)[ ,]*/", $list, $matches);
Then read $matches[1] for the numbers
Not sure about performance but you can use a regex to grab only numbers after you join everything into a string.
$string = implode(' ', $array);
preg_match_all('/\d+/', $string, $matches);
print_r($matches[0]);
You may want to use preg_split and array_merge (PHP 4, PHP 5)

Remove first two words from a string

I have a string:
$string = "R 124 This is my message";
At times, the string may change, such as:
$string = "R 1345255 This is another message";
Using PHP, what's the best way to remove the first two "words" (e.g., the initial "R" and then the subsequent numbers)?
Thanks for the help!
$string = explode (' ', $string, 3);
$string = $string[2];
Must be much faster than regexes.
One way would be to explode the string in "words", using explode or preg_split (depending on the complexity of the words separators : are they always one space ? )
For instance :
$string = "R 124 This is my message";
$words = explode(' ', $string);
var_dump($words);
You'd get an array like this one :
array
0 => string 'R' (length=1)
1 => string '124' (length=3)
2 => string 'This' (length=4)
3 => string 'is' (length=2)
4 => string 'my' (length=2)
5 => string 'message' (length=7)
Then, with array_slice, you keep only the words you want (not the first two ones) :
$to_keep = array_slice($words, 2);
var_dump($to_keep);
Which gives :
array
0 => string 'This' (length=4)
1 => string 'is' (length=2)
2 => string 'my' (length=2)
3 => string 'message' (length=7)
And, finally, you put the pieces together :
$final_string = implode(' ', $to_keep);
var_dump($final_string);
Which gives...
string 'This is my message' (length=18)
And, if necessary, it allows you to do couple of manipulations on the words before joining them back together :-)
Actually, this is the reason why you might choose that solution, which is a bit longer that using only explode and/or preg_split ^^
try
$result = preg_replace('/^R \\d+ /', '', $string, 1);
or (if you want your spaces to be written in a more visible style)
$result = preg_replace('/^R\\x20\\d+\\x20/', '', $string, 1);
$string = preg_replace("/^\\w+\\s\\d+\\s(.*)/", '$1', $string);
$string = preg_replace('/^R\s+\d+\s*/', '', $string);

Categories