Creating a Regular Expression pattern to match space delimited string - php

I have file with a lot of rows (over 32k). Rows looks like:
34 Item
5423 11Item
44 Item
First digits it is IDs. I want make assoc. array: array("34" => "Item", "5423" => "11Item", "44" => "Item")
IDs can be from 1 to 5 length (1 - 65366)
Name of item can start from with a digit
Minimum one (BUT can be MORE than one) space between IDs and Items name
So main divide is space or certain number of them. Using PHP.

You can use this:
$data = <<<'LOD'
34 Item
5423 11Item
44 Item
546
65535 toto le héros
65536 belzebuth
glups glips
LOD;
$result = array();
$line = strtok($data, "\r\n");
while($line!==false) {
$tmp = preg_split('~\s+~', $line, 2, PREG_SPLIT_NO_EMPTY);
if (count($tmp)==2 && $tmp[0]==(string)(int)$tmp[0] && $tmp[0]<65536)
$result[$tmp[0]] = $tmp[1];
$line = strtok("\r\n");
}
print_r($result);

Here's a method which doesn't check validity of data but might works. It explodes every line according to space(s) and put results in a $res associative array.
For information, preg_split() allows to split a string with a regex.
$res = array();
foreach($lines as $line) {
$data = preg_split('/\s+/', $line);
$res[$data[0]] = $data[1];
}
If you really want to check your conditions you can add some if statement, with the ID limit:
$res = array();
foreach($lines as $line) {
$data = preg_split('/\s+/', $line);
$idx = intval($data[0]);
if($idx > 0 && $idx < 65366) // skip lines where the ID seems invalid
$res[$data[0]] = $data[1];
}

Use preg_match with named capturing groups:
preg_match('/^(?<id>\d+)\s+(?<name>[\w ]+)$/', $row, $matches);
$matches['id'] will contain the ID and $matches['name'] will contain the name.
while (/* get each row */) {
preg_match('/^(?<id>\d+)\s+(?<name>[\w ]+)$/', $row, $matches);
$id = $matches['id'];
$name = $matches['name'];
if ($id > 1 && $id < 65366) {
$arr[$id] = $name;
}
}
print_r($arr);
Example output:
Array
(
[34] => Item
[5423] => 11Item
[44] => Item
[3470] => BLABLA TEF2200
)
Demo

Use http://uk3.php.net/preg_split
i.e.
preg_split("/ +/", $line);
It will return an array of strings.

Related

Parse formatted strings containing 3 delimiters to create multiple flat arrays

I have strings in following format:
$strings[1] = cat:others;id:4,9,13
$strings[2] = id:4,9,13;cat:electric-products
$strings[3] = id:4,9,13;cat:foods;
$strings[4] = cat:drinks,foods;
where cat means category and id is identity number of a product.
I want to split these strings and convert into arrays $cats = array('others'); and $ids = array('4','9','13');
I know that it can be done by foreach and explode function through multiple steps. I think I am somewhere near, but the following code does not work.
Also, I wonder if it can be done by preg_match or preg_split in fewer steps. Or any other simpler method.
foreach ($strings as $key=>$string) {
$temps = explode(';', $string);
foreach($temps as $temp) {
$tempnest = explode(':', $temp);
$array[$tempnest[0]] .= explode(',', $tempnest[1]);
}
}
My desired result should be:
$cats = ['others', 'electric-products', 'foods', 'drinks';
and
$ids = ['4','9','13'];
One option could be doing a string compare for the first item after explode for cat and id to set the values to the right array.
$strings = ["cat:others;id:4,9,13", "id:4,9,13;cat:electric-products", "id:4,9,13;cat:foods", "cat:drinks,foods"];
foreach ($strings as $key=>$string) {
$temps = explode(';', $string);
$cats = [];
$ids = [];
foreach ($temps as $temp) {
$tempnest = explode(':', $temp);
if ($tempnest[0] === "cat") {
$cats = explode(',', $tempnest[1]);
}
if ($tempnest[0] === "id") {
$ids = explode(',', $tempnest[1]);
}
}
print_r($cats);
print_r($ids);
}
Php demo
Output for the first item would for example look like
Array
(
[0] => others
)
Array
(
[0] => 4
[1] => 9
[2] => 13
)
If you want to aggregate all the values in 2 arrays, you can array_merge the results, and at the end get the unique values using array_unique.
$strings = ["cat:others;id:4,9,13", "id:4,9,13;cat:electric-products", "id:4,9,13;cat:foods", "cat:drinks,foods"];
$cats = [];
$ids = [];
foreach ($strings as $key=>$string) {
$temps = explode(';', $string);
foreach ($temps as $temp) {
$tempnest = explode(':', $temp);
if ($tempnest[0] === "cat") {
$cats = array_merge(explode(',', $tempnest[1]), $cats);
}
if ($tempnest[0] === "id") {
$ids = array_merge(explode(',', $tempnest[1]), $ids);
}
}
}
print_r(array_unique($cats));
print_r(array_unique($ids));
Output
Array
(
[0] => drinks
[1] => foods
[3] => electric-products
[4] => others
)
Array
(
[0] => 4
[1] => 9
[2] => 13
)
Php demo
I don't generally recommend using variable variables, but you are looking for a sleek snippet which uses regex to avoid multiple explode() calls.
Here is a script that will use no explode() calls and no nested foreach() loops.
You can see how the \G ("continue" metacharacter) allows continuous matches relative the "bucket" label (id or cat) by calling var_export($matches);.
If this were my own code, I'd probably not create separate variables, but a single array containing id and cat --- this would alleviate the need for variable variables.
By using the encountered value as the key for the element to be added to the bucket, you are assured to have no duplicate values in any bucket -- just call array_values() if you want to re-index the bucket elements.
Code: (Demo) (Regex101)
$count = preg_match_all(
'/(?:^|;)(id|cat):|\G(?!^),?([^,;]+)/',
implode(';', $strings),
$matches,
PREG_UNMATCHED_AS_NULL
);
$cat = [];
$id = [];
for ($i = 0; $i < $count; ++$i) {
if ($matches[1][$i] !== null) {
$arrayName = $matches[1][$i];
} else {
${$arrayName}[$matches[2][$i]] = $matches[2][$i];
}
}
var_export(array_values($id));
echo "\n---\n";
var_export(array_values($cat));
All that said, I probably wouldn't rely on regex because it isn't very readable to the novice regex developer. The required logic is much simpler and easier to maintain with nested loops and explosions. Here is my adjustment of your code.
Code: (Demo)
$result = ['id' => [], 'cat' => []];
foreach ($strings as $string) {
foreach (explode(';', $string) as $segment) {
[$key, $values] = explode(':', $segment, 2);
array_push($result[$key], ...explode(',', $values));
}
}
var_export(array_unique($result['id']));
echo "\n---\n";
var_export(array_unique($result['cat']));
P.s. your posted coding attempt was using a combined operator .= (assignment & concatenation) instead of the more appropriate combined operator += (assignment & array union).

how to filter array value that contains numbers only

I have a file as array
'line number' => 'it's content';
now i need a way to remove lines that contains only numbers (without letters or special caracters)
i did it by the length of the line but this method is not appropriate because it might delete some of the contents
/*count file lines*/
$linecount = 0;
$handle = fopen($file, "r");
while(!feof($handle)){
$line = fgets($handle);
$linecount++;
}
fclose($handle);
for($i=0; $i<=($linecount*2)-4;$i++) {
$length = strlen((string)$text[0][$i]);
if ($length >5)
{
echo ($text[0][$i] .'</br>');
}
}
is_numric didn't work i think because there is a space before the numbers
I modified the second line. Didn't understand why $length>5,it should $length>0
strlen(trim((string)$text[0][$i]));
for($i=0; $i<=($linecount*2)-4;$i++) {
$length = strlen(trim((string)$text[0][$i]));
if ($length >0)
{
echo ($text[0][$i] .'</br>');
}
}
Please try this
$keys=array_filter(array_keys($array1), "is_numeric");
$out =array_diff_key($array1,array_flip($keys));
print_r($out);
try this:
$pattern = '/^[0-9 ]+$/';
if ( !preg_match ($pattern, $text) )
{
echo 'allowed';
}
From here
As there was no example data given I have based this answer upon an assumed data input given the description - it may well bear no resemblance to the actual data.
The input file I used has the following lines of data - a mix of characters and numbers
abc23
123
89
gh46m
12 34 56
$file='c:/temp/src.txt';
$lines=array_filter( file( $file ), function( $item ){
$pttn='#^[0-9\s]{1,}$#';
preg_match( $pttn, $item ,$matches );
return count( $matches ) > 0 ? true : false;
});
echo '<pre>',print_r($lines,true),'</pre>';
This outputs the following:
Array
(
[1] => 123
[2] => 89
[4] => 12 34 56
)
If a space counts as a special character then simply modify the regex pattern by removing \s and that should match only numbers

php remove duplicate words in an array

Sorry for English is not my mother language, maybe the question title is not quite good. I want to do something like this.
$str = array("Lincoln Crown","Crown Court","go holiday","house fire","John Hinton","Hinton Jailed");
here is an array, "Lincoln Crown" contain "Lincoln" and "Crown", so remove next words, which contains these 2 words, and "Crown Court(contain Crown)" has been removed.
in another case. "John Hinton" contain "John" and "Hinton", so "Hinton Jailed(contain Hinton)" has been removed. the final output should be like this:
$output = array("Lincoln Crown","go holiday","house fire","John Hinton");
for my php skill is not good, it is not simply to use array_unique() array_diff(), so open a question for help, thanks.
I think this might work :P
function cool_function($strs){
// Black list
$toExclude = array();
foreach($strs as $s){
// If it's not on blacklist, then search for it
if(!in_array($s, $toExclude)){
// Explode into blocks
foreach(explode(" ",$s) as $block){
// Search the block on array
$found = preg_grep("/" . preg_quote($block) . "/", $strs);
foreach($found as $k => $f){
if($f != $s){
// Place each found item that's different from current item into blacklist
$toExclude[$k] = $f;
}
}
}
}
}
// Unset all keys that was found
foreach($toExclude as $k => $v){
unset($strs[$k]);
}
// Return the result
return $strs;
}
$strs = array("Lincoln Crown","Crown Court","go holiday","house fire","John Hinton","Hinton Jailed");
print_r(cool_function($strs));
Dump:
Array
(
[0] => Lincoln Crown
[2] => go holiday
[3] => house fire
[4] => John Hinton
)
Seems like you would need a loop and then build a list of words in the array.
Like:
<?
// Store existing array's words; elements will compare their words to this array
// if an element's words are already in this array, the element is deleted
// else the element has its words added to this array
$arrayWords = array();
// Loop through your existing array of elements
foreach ($existingArray as $key => $phrase) {
// Get element's individual words
$words = explode(" ", $phrase);
// Assume the element will not be deleted
$keepWords = true;
// Loop through the element's words
foreach ($words as $word) {
// If one of the words is already in arrayWords (another element uses the word)
if (in_array($word, $arrayWords)) {
// Delete the element
unset($existingArray[$key]);
// Indicate we are not keeping any of the element's words
$keepWords = false;
// Stop the foreach loop
break;
}
}
// Only add the element's words to arrayWords if the entire element stays
if ($keepWords) {
$arrayWords = array_merge($arrayWords, $words);
}
}
?>
As I would do in your case:
$words = array();
foreach($str as $key =>$entry)
{
$entryWords = explode(' ', $entry);
$isDuplicated = false;
foreach($entryWords as $word)
if(in_array($word, $words))
$isDuplicated = true;
if(!$isDuplicated)
$words = array_merge($words, $entryWords);
else
unset($str[$key]);
}
var_dump($str);
Output:
array (size=4)
0 => string 'Lincoln Crown' (length=13)
2 => string 'go holiday' (length=10)
3 => string 'house fire' (length=10)
4 => string 'John Hinton' (length=11)
I can imagine quite a few techniques that can provide your desired output, but the logic that you require is poorly defined in your question. I am assuming that whole word matching is required -- so word boundaries should be used in any regex patterns. Case sensitivity isn't mentioned. I am unsure if only fully unique elements (multi-word strings) should have their words entered into the black list. I'll offer a few snippets, but choosing the appropriate technique will depend on exact logical requirements.
Demo
$output = [];
$blacklist = [];
foreach ($input as $string) {
if (!$blacklist || !preg_match('/\b(?:' . implode('|', $blacklist) . ')\b/', $string)) {
$output[] = $string;
}
foreach(explode(' ', $string) as $word) {
$blacklist[$word] = preg_quote($word);
}
}
var_export($output);
Demo
$output = [];
$blacklist = [];
foreach ($input as $string) {
$words = explode(' ', $string);
foreach ($words as $word) {
if (in_array($word, $blacklist)) {
continue 2;
}
}
array_push($blacklist, ...$words);
$output[] = $string;
}
var_export($output);
And my favorite because it performs fewest iterations in the parent loop, is more compact, and doesn't require the declaration/maintenance of a blacklist array.
Demo
$output = [];
while ($input) {
$output[] = $words = array_shift($input);
$input = preg_grep('~\b(?:\Q' . str_replace(' ', '\E|\Q', $words) . '\E)\b~', $input, PREG_GREP_INVERT);
}
var_export($output);
You can explode each string in the original array and then compare per-words using a loop (comparing each word from one array with each word from another, and if they match, remove the whole array).
array_unique() example
<?php
$input = array("a" => "green", "red", "b" => "green", "blue", "red");
$result = array_unique($input);
print_r($result);
?>
output:
Array
(
[a] => green
[0] => red
[1] => blue
)
Source

Find all the occurrence points of a letter within a string

I have the following code:
<?php
$word = "aeagle";
$letter = "e";
$array = strposall($aegle, $letter);
print_r($array);
function strposall($haystack, $needle) {
$occurrence_points = array();
$pos = strpos($haystack, $needle);
if ($pos !== false) {
array_push($occurrence_points, $pos);
}
while ($pos = strpos($haystack, $needle, $pos + 1)) {
array_push($occurrence_points, $pos);
}
return $occurrence_points;
}
?>
As in the example, if I have aegle as my word and I'm searching for e within it, the function should return an array with the values 1 and 4 in it.
What's wrong with my code?
Why not trying instead
$word = "aeagle";
$letter = "e";
$occurrence_points = array_keys(array_intersect(str_split($word), array($letter)));
var_dump($occurrence_points);
I think you're passing the wrong parameters, shouild be $word instead of $aegle
Little bit more literal than the other answer:
function charpos($str, $char) {
$i = 0;
$pos = 0;
$matches = array();
if (strpos($str, $char) === false) {
return false;
}
while (!!$str) {
$pos = strpos($str, $char);
if ($pos === false) {
$str = '';
} else {
$i = $i + $pos;
$str = substr($str, $pos + 1);
array_push($matches, $i++);
}
}
return $matches;
}
https://ignite.io/code/511ff26eec221e0741000000
Using:
$str = 'abc is the place to be heard';
$positions = charpos($str, 'a');
print_r($positions);
while ($positions) {
$i = array_shift($positions);
echo "$i: $str[$i]\n";
}
Which gives:
Array (
[0] => 0
[1] => 13
[2] => 25
)
0: a
13: a
25: a
Other's have pointed out you're passing the wrong parameters. But you're also reinventing the wheel. Take a look at php's regular expression match-all (whoops, had linked the wrong function), it will already return an array of all matches with offsets, when used with the following flag.
flags
flags can be the following flag:
PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the value of matches into an array where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1.
Use a single letter pattern for the search term $letter = '/e/' and you should get back an array with all your positions as the second element of each result array, which you can then finagle into the output format you're looking for.
Update: Jared points out that you do get the capture of the pattern back, but with the flag set, you also get the offset. As a direct answer to the OP's question, try this code:
$word = "aeagle";
$pattern = "/e/";
$matches = array();
preg_match_all($pattern, $word, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
It has the following ouput:
Array
(
// Matches of the first pattern: /e/
[0] => Array
(
// First match
[0] => Array
(
// Substring of $word that matched
[0] => e
// Offset into $word where previous substring starts
[1] => 1
)
[1] => Array
(
[0] => e
[1] => 5
)
)
)
The results are 3D instead of 2D because preg_match_all can match multiple patterns at once. The hits are for the first (and in this case: only) pattern supplied and are thus in the first array.
And unlike the OP originally stated, 1 and 5 are the correct indexes of the letter e in the string 'aeagle'
aeagle
012345
^ ^
1 5
Performance wise, the customized version of strposall would probably be faster than a regular expression match. But learning to use an in-built function is almost always faster than developing, testing, supporting and maintaining your own code. And 9 times out of 10, that's the most expensive part of programming.

Count how often the word occurs in the text in PHP

In php I need to Load a file and get all of the words and echo the word and the number of times each word shows up in the text,
(I also need them to show up in descending order most used words on top) ★✩
Here's an example:
$text = "A very nice únÌcÕdë text. Something nice to think about if you're into Unicode.";
// $words = str_word_count($text, 1); // use this function if you only want ASCII
$words = utf8_str_word_count($text, 1); // use this function if you care about i18n
$frequency = array_count_values($words);
arsort($frequency);
echo '<pre>';
print_r($frequency);
echo '</pre>';
The output:
Array
(
[nice] => 2
[if] => 1
[about] => 1
[you're] => 1
[into] => 1
[Unicode] => 1
[think] => 1
[to] => 1
[very] => 1
[únÌcÕdë] => 1
[text] => 1
[Something] => 1
[A] => 1
)
And the utf8_str_word_count() function, if you need it:
function utf8_str_word_count($string, $format = 0, $charlist = null)
{
$result = array();
if (preg_match_all('~[\p{L}\p{Mn}\p{Pd}\'\x{2019}' . preg_quote($charlist, '~') . ']+~u', $string, $result) > 0)
{
if (array_key_exists(0, $result) === true)
{
$result = $result[0];
}
}
if ($format == 0)
{
$result = count($result);
}
return $result;
}
$words = str_word_count($text, 1);
$word_frequencies = array_count_values($words);
arsort($word_frequencies);
print_r($word_frequencies);
This function uses a regex to find words (you might want to change it, depending on what you define a word as)
function count_words($text)
{
$output = $words = array();
preg_match_all("/[A-Za-z'-]+/", $text, $words); // Find words in the text
foreach ($words[0] as $word)
{
if (!array_key_exists($word, $output))
$output[$word] = 0;
$output[$word]++; // Every time we find this word, we add 1 to the count
}
return $output;
}
This iterates over each word, constructing an associative array (with the word as the key) where the value refers to the occurences of each word. (e.g. $output['hello'] = 3 => hello occured 3 times in the text).
Perhaps you might want to change the function to deal with case insensitivity (i.e. 'hello' and 'Hello' are not the same word, according to this function).
echo count(explode('your_word', $your_text));

Categories