Retrieving words with colon, and associated data

Retrieving words with colon, and associated data - php

I have data formatted as such:
some words go here priority: p1,p2 -rank:3 status: not delayed
Basically I need to retrieve each set of data that corresponds to the colon name.
Ideally if I could end up with an array structure such that
keywords => 'some words go here'
priority => 'p1,p2'
-rank => 3
status => 'not delayed'
A few caveats:
keywords will not have a defining colon-word (keywords are just placed in the front)
keywords will not always exist (might just be colon-words)
colon-words will not always exist (might just be keywords)
I imagine regex will have to be used to parse this out, but this goes beyond my understanding of regex.
If there is a simpler approach to this I'd be happy to find out.
Any help appreciated!

A regular expression will certainly be a much more elegant approach to this as #HamZa showed, but here's a proof of concept to illustrate that you could just brute force the solution. Keep in mind, this is a proof of concept, I won't be doing your entire assignment for you ;)
<?php
$string = "keywords go here priority: p1,p2 -rank:3 status: not delayed";
$kv = array();
$key = "keywords";
$substrings = explode(":", $string);
foreach($substrings as $k => $substring) {
$pieces = explode(" ", $substring);
$chunk = $k == count($substrings) - 1 ? 0 : 1;
$kv[$key] = trim(join(" ", array_slice($pieces, 0, count($pieces)-$chunk)));
$key = $pieces[count($pieces)-1];
}
print_r($kv);
// Array
// (
// [keywords] => keywords go here
// [priority] => p1,p2
// [-rank] => 3
// [status] => not delayed
// )

Related

Add keys in associative php array as per reference

I have an array, so I wanted to add keys to each value, for example, if an array contains Facebook URL then the key should be Facebook if an array has a link of Instagram then the key should be Instagram and the list goes on.
Here's the code
<?php
foreach($social_media as $social){
$typesocial = $social['type'];
if($social['type'] === 'social network') {
$val[] = $social['url']['resource'];
}
}
print_r($val);
?>
Array (
[0] => https://plus.google.com/+beyonce
[1] => https://twitter.com/Beyonce
[2] => https://www.facebook.com/beyonce
[3] => https://www.instagram.com/beyonce/
[4] => http://www.weibo.com/beyonceofficial
)
It should become, if the value has a link of twitter then the key should be twitter if Instagram then it should be Instagram
Array (
[google] => https://plus.google.com/+beyonce
[twitter] => https://twitter.com/Beyonce
[facebook] => https://www.facebook.com/beyonce
[instagram] => https://www.instagram.com/beyonce/
[weibo] => http://www.weibo.com/beyonceofficial
)

$indexed = [
'https://plus.google.com/+beyonce',
'https://twitter.com/Beyonce',
'https://www.facebook.com/beyonce',
'https://www.instagram.com/beyonce/',
'http://www.weibo.com/beyonceofficial',
];
Assuming the host consists of 2 or 3 parts
$assoc = [];
foreach($indexed as $url) {
$host = explode('.', parse_url($url, PHP_URL_HOST));
switch (count($host)) {
case 2:
$assoc[$host[0]] = $url;
break;
case 3:
$assoc[$host[1]] = $url;
}
}
Will output $assoc as
array(5) {
'google' => string(32) "https://plus.google.com/+beyonce"
'twitter' => string(27) "https://twitter.com/Beyonce"
'facebook' => string(32) "https://www.facebook.com/beyonce"
'instagram' => string(34) "https://www.instagram.com/beyonce/"
'weibo' => string(36) "http://www.weibo.com/beyonceofficial"
}

This maybe? Hope to help
EDITED
$array = Array (
0 => 'https://plus.google.com/+beyonce',
1 => 'https://twitter.com/Beyonce',
2 => 'https://www.facebook.com/beyonce',
3 => 'https://www.instagram.com/beyonce/',
4 => 'http://www.weibo.com/beyonceofficial',
6 => 'http://www.bbc.co.uk/a/witty/documentary'
);
$output = Array();
foreach($array as $key => $url){
$urlarr = parse_url($url);
$arr = explode('.',$urlarr['host']);
$name = $arr[count($arr) - 2];
if($name == 'co'){
$name = $arr[count($arr) - 3];
}
$output[$name] = $url;
}
print_r($output);

You may want to use strpos() so you can find the words and add specific key, something like this:
$new_array = [];
foreach($old_array AS $value){
if(strpos($value, 'facebook')) {
$new_array['facebook'] = $value;
}elseif(strpos($value, 'instagram')) {
$new_array['instagram'] = $value;
}elseif(strpos($value, 'twitter')) {
$new_array['twitter'] = $value;
}else{ //so it goes...
$new_array['unknow'] = $value;
}
}
Of course if you have two url with facebook, the second will overwrite the first one so depends of what you need you may add an aditional index... like:
$new_array['facebook'][] = $value;
Hope it helps!

An quick/easy-ish way to approach this might be to do a regex match on the domains.
For example, if you have a finite set of social media and can trust the middle of the domain name, then you can extract that part of the url to put it as part of your key.
For example the regex /google|twitter/ would match google from https://plus.google.com/+beyonce. This can then be used for your key.
So looping through your set, you'd run preg_match which would attempt to find the key based on the domain name. preg_match has a parameter called $match that then stores the google part of what you're matching. You can use this to be the key of your formatted array.
$array = [
'https://plus.google.com/+beyonce',
'https://twitter.com/Beyonce',
'https://www.facebook.com/beyonce',
'https://www.instagram.com/beyonce/',
'http://badmedia.com/beyonce',
'http://www.weibo.com/beyonceofficial'
];
$keyed = [];
foreach ($array as $i => $url) {
preg_match('/google|twitter|facebook|instagram|weibo/', $url, $match);
$key = isset($match[0]) ? $match[0] : 'notfound-' . $i;
$keyed[$key] = $url;
}
The result here is that $keyed now contains your array for $keyed['google'] and the others.
Add or remove from the google|twitter|facebook|instagram|weibo to have a |another where you want to have more or less support for different sites.
It's not very scalable, and if the domain doesn't match a key you want to use, then in this case I've set it to create a key called notfound- with the original array key. (I've added badmedia.com to the list to demonstrate it.). You also run the risk of having a conflict where someone might be smart and have their Facebook username as "nottwitter" which will technically match "twitter"
So what you might want to consider rather than relying on the domain to provide your key for you, to keep a regex match for each full domain name. You could store these regex queries against the key you wanted to use, and nest your two loops to find a match...
For example:
$socialMedia = [
'facebook' => '/^(https?:\/\/)?(www\.)?facebook.com\/[a-zA-Z0-9(\.\?)?]/',
]
foreach ($array as $url) {
foreach ($socialMedia as $key => $regex) {
if (preg_match($regex, $url)) {
$keyed[$key] = $url;
}
}
}
(Shoutout to https://gist.github.com/atomicpages/4619196 for the full FB domain regex)
For very small sets (which I imagine you're dealing with at any one time) this would scale slightly better and give you a better result... It also rejects and removes any you don't support, but you would need to catch those separately.
For larger sets you probably want to look at how you collect the urls in the first place... Rather than accept any list, maybe have the form or API call you're receiving these from state which one it is on the way in.

php foreach group results by 3 sets of characters

I am trying to group some foreach results based on the first 3 sets of characters.
For example i am currently listing sku codes for products and they look like this:
REF-MUSBOM-0500-ORA
REF-PROCOF-0001-LAT
REF-WHEREF-0001-TRO
REF-WHEREF-0001-ORA
REF-SHAKER-0700-C/B
REF-CREMON-0100-N/A
REF-GLUSUL-0090-N/A
REF-CRECAP-0090-N/A
REF-ALBFER-0120-N/A
REF-TSHCOT-LARG-BLK
REF-TSHCOT-MEDI-BLK
REF-ALBMAG-0090-N/A
REF-GYMJUG-2200-N/A
REF-OMEGA3-0090-N/A
REF-NEXGEN-0060-N/A
REF-VITAD3-0100-N/A
REF-SSSHAK-0739-N/A
REF-GINKGO-0090-N/A
REF-DIGEZY-0090-N/A
REF-VEST00-MEDI-N/A
REF-VEST00-LARG-N/A
REF-CREMON-0250-N/A
REF-MSM----0250-N/A
REF-GRNTEA-0100-N/A
REF-COLOST-0100-N/A
REF-GLUCHO-0090-N/A
REF-ZINCMA-0100-N/A
REF-BETALA-0250-N/A
REF-DRIBOS-0250-N/A
REF-HMB000-0090-N/A
REF-ALACID-0090-N/A
REF-CLA000-0090-N/A
REF-ACETYL-0090-N/A
REF-NXGPRO-0090-N/A
REF-LGLUTA-0250-N/A
REF-BCAA20-0200-N/A
REF-FLAPJA-0012-ACR
REF-FLAPJA-0012-MAP
REF-LCARNI-0100-N/A
REF-CORDYC-0090-N/A
REF-CREMON-0500-N/A
REF-BCAAEN-0330-APP
REF-PREWKT-0300-FPU
REF-TESFUS-0090-N/A
REF-AMIIFUS-0300-GAP
REF-AMIIFUS-0300-WME
REF-BCAINT-0400-FPU
REF-KRILLO-0090-N/A
REF-AMIIFUS-0300-PLE
REF-AMIIFUS-0300-FPU
REF-BCAINT-0400-WME
REF-ENZQ10-0090-N/A
REF-THERMO-0100-N/A
REF-LGLUTA-0500-N/A
REF-RBAR00-0012-DCB
REF-RBAR00-0012-PBC
REF-RBAR00-0012-WCR
REF-IMHEAV-2200-CHO
REF-PROCOF-0012-N/A
REF-DIEPRO-0900-STR
REF-DIEPRO-0900-BOF
REF-DIEPRO-0900-CHO
REF-INWPRO-0900-VAN
REF-INWPRO-0900-BOF
REF-INWPRO-0900-BCS
REF-INWPRO-0900-CHO
REF-INWPRO-0900-CMI
REF-INWPRO-0900-RAS
REF-INWPRO-0900-STR
REF-INWPRO-0900-CIN
REF-INWPRO-0900-CPB
REF-EGGPRO-0900-CHO
REF-EGGPRO-0900-VAN
REF-MICCAS-0909-CHO
REF-MICCAS-0909-CMI
REF-MICCAS-0909-VAN
REF-MICCAS-0909-STR
REF-BCAA50-0500-N/A
REF-MICWHE-0909-STR
REF-MICWHE-0909-VAN
REF-MICWHE-0909-CHIO
REF-MICWHE-0909-BAN
REF-1STOXT-2030-STR
REF-1STOXT-2030-VAN
REF-1STOXT-2030-CHO
REF-MUSBOM-0600-BCH
REF-MUSBOM-0600-FPU
REF-MUSBCF-0600-BCH
REF-MUSBCF-0600-FPU
REF-VEGANP-2100-STR
REF-VEGANP-2100-CHO
REF-INMPRO-2270-CPB
REF-DIETMR-2400-CPB
REF-INMPRO-2270-SCR
REF-INMPRO-2270-VIC
REF-MATRIX-1800-FRU
REF-INMPRO-2270-BOF
REF-MATRIX-1800-CHO
REF-INMPRO-2270-CHO
REF-ONESTO-2100-CHO
In the above list there are 2 skus which are:
REF-WHEREF-0001-TRO
REF-WHEREF-0001-ORA
The first 3 sets of characters split by - are the same. What would be the best approach of grouping all results leaving me an array something like this:
Array
(
[REF-WHEREF-0001] => Array
(
[0] => REF-WHEREF-0001-TRO
[1] => REF-WHEREF-0001-ORA
)
)

Are the first 3 groups (excluding the multiple -) always 13 characters? Then do something like this:
<?php
$arr = ["REF-MUSBOM-0500-ORA",
"REF-PROCOF-0001-LAT",
"REF-WHEREF-0001-TRO",
"REF-WHEREF-0001-PPL"];
$resultArr = [];
foreach ($arr as $sku) {
$resultArr[substr($sku, 0, 15)][] = $sku;
}
var_dump($resultArr);
If that length varies you might want to work with a regex or the strpos() of the third -.
I must say that I think you could come up with this yourself, since you were already thinking in the right direction i.e. foreach()
EDIT: Because I found other solutions more elegant looking, I decided to compare efficiency. This solution is a lot faster than the other ones.

I always create a new array with the index that I need for group, try this:
$arr=array('REF-MUSBOM-0500-ORA',
'REF-PROCOF-0001-LAT',
'REF-WHEREF-0001-TRO');
$newarr=array();
foreach($arr as $a){
$b=explode('-',$a);
array_pop($b);
$b=implode("-", $b);
$newarr[$b][]=$a;
}
echo '<pre>',print_r($newarr),'</pre>';

You will need to pick a group with some basics use of explode, implode and str_replace.
What does this solution do.
loop through the array of your items
explode to get last item index of exploded string assuming that it
would be dynamic in the end
implode & str_replace again to find out string of group name
And last strpos & in_array to have sample reponse
Solution
$array = array(
'REF-MUSBOM-0500-ORA',
'REF-PROCOF-0001-LAT',
'REF-WHEREF-0001-TRO',
'REF-WHEREF-0001-ORA',
'REF-SHAKER-0700-C/B',
'REF-CREMON-0100-N/A',
'REF-GLUSUL-0090-N/A',
'REF-CRECAP-0090-N/A',
'REF-ALBFER-0120-N/A',
);
$new_array = array();
foreach ($array as $key => $val) {
$group_arr = explode('-', $val);
$end = end($group_arr);
$combined_group = implode('-', $group_arr);
$group = str_replace('-' . $end, '', $combined_group);
if (strpos($val, $group) !== false && !in_array($group, $new_array)) {
$new_array[$group][] = $val;
}
}
echo '<pre>';print_r($new_array);echo '</pre>';
See demo on Sandbox

similar substring in other string PHP

How to check substrings in PHP by prefix or postfix.
For example, I have the search string named as $to_search as follows:
$to_search = "abcdef"
And three cases to check the if that is the substring in $to_search as follows:
$cases = ["abc def", "def", "deff", ... Other values ...];
Now I have to detect the first three cases using substr() function.
How can I detect the "abc def", "def", "deff" as substring of "abcdef" in PHP.

You might find the Levenshtein distance between the two words useful - it'll have a value of 1 for abc def. However your problem is not well defined - matching strings that are "similar" doesn't mean anything concrete.
Edit - If you set the deletion cost to 0 then this very closely models the problem you are proposing. Just check that the levenshtein distance is less than 1 for everything in the array.

This will find if any of the strings inside $cases are a substring of $to_search.
foreach($cases as $someString){
if(strpos($to_search, $someString) !== false){
// $someString is found inside $to_search
}
}
Only "def" is though as none of the other strings have much to do with each other.
Also on a side not; it is prefix and suffix not postfix.

To find any of the cases that either begin with or end with either the beginning or ending of the search string, I don't know of another way to do it than to just step through all of the possible beginning and ending combinations and check them. There's probably a better way to do this, but this should do it.
$to_search = "abcdef";
$cases = ["abc def", "def", "deff", "otherabc", "noabcmatch", "nodefmatch"];
$matches = array();
$len = strlen($to_search);
for ($i=1; $i <= $len; $i++) {
// get the beginning and end of the search string of length $i
$pre_post = array();
$pre_post[] = substr($to_search, 0, $i);
$pre_post[] = substr($to_search, -$i);
foreach ($cases as $case) {
// get the beginning and end of each case of length $i
$pre = substr($case, 0, $i);
$post = substr($case, -$i);
// check if any of them match
if (in_array($pre, $pre_post) || in_array($post, $pre_post)) {
// using the case as the array key for $matches will keep it distinct
$matches[$case] = true;
}
}
}
// use array_keys() to get the keys back to values
var_dump(array_keys($matches));

You can use array_filter function like this:
$cases = ["cake", "cakes", "flowers", "chocolate", "chocolates"];
$to_search = "chocolatecake";
$search = strtolower($to_search);
$arr = array_filter($cases, function($val) use ($search) { return
strpos( $search,
str_replace(' ', '', preg_replace('/s$/', '', strtolower($val))) ) !== FALSE; });
print_r($arr);
Output:
Array
(
[0] => cake
[1] => cakes
[3] => chocolate
[4] => chocolates
)
As you can it prints all the values you expected apart from deff which is not part of search string abcdef as I commented above.

Full text search PHP alone

I have an InnoDB table from which values are retrieved and stored in an array in PHP.
Now I want to sort the array by relevance to the matches in the search string.
eg: If I search "hai how are you", it will split the string into separate words as "hai" "how" "are" "you" and the results after search must be as follows:
[0] hai how are all people there
[1] how are things going
[2] are you coming
[3] how is sam
...
Is there any way I can sort the array by relevance in basic PHP functions alone?

Maybe something like this:
$arrayToSort=array(); //define your array here
$query="hai how are you";
function compare($arrayMember1,$arrayMember2){
$a=similar_text($arrayMember1,$query);
$b=similar_text($arrayMember2,$query);
if($a>$b)return 1;
else return -1;
}
usort($arrayToSort,"compare");
Look in the php manual for clarification on what similar_text and usort do.

$searchText = "hai how are you"; //eg: if there are multiple spaces between words
$searchText = preg_replace("(\s+)", " ", $searchText );
$searchArray =& split( " ", $searchText );
$text = array(0 => 'hai how are all people there',
1 => 'how are things going ',
2 => 'are you coming',
3 => 'how is sam',
4 => 'testing ggg');
foreach($text as $key=>$elt){
foreach($searchArray as $searchelt){
if(strpos($elt,$searchelt)!== FALSE){
$matches[] = $key; //just storing key to avoid memory wastage
break;
}
}
}
//print the matched string with help of stored keys
echo '<pre>matched string are as follows: ';
foreach ($matches as $key){
echo "<br>{$text[$key]}";
}

Split a string, remember the positions of splitting

Assume I have the following string:
I have | been very busy lately and need to go | to bed early
By splitting on "|", you get:
$arr = array(
[0] => I have
[1] => been very busy lately and need to go
[2] => to bed early
)
The first split is after 2 words, and the second split 8 words after that. The positions after how many words to split will be stored: array(2, 8, 3). Then, the string is imploded to be passed on to a custom string tagger:
tag_string('I have been very busy lately and need to go to bed early');
I don't know what the output of tag_string will be exactly, except that the total words will remain the same. Examples of output would be:
I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p
I-ee have been-vb very busy-df lately-nn and need-f to go to bed-uu early-yy
This will lengthen the string by an unknown number of characters. I have no control over tag_string. What I know is (1) the number of words will be the same as before and (2) the array was split after 2, and thereafter after 8 words, respectively. I now need a solution explode the tagged string into the same array as before:
$string = "I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p"
function split_string_again() {
// split after 2nd, and thereafter after 8th word
}
With output:
$arr = array(
[0] => I have-nn
[1] => been-vb very-vb busy lately and-rr need to-r go
[2] => to bed early-p
)
So to be clear (I wasn't before): I cannot split by remembering the strpos, because strpos before and after the string went through the tagger, aren't the same. I need to count the number of words. I hope I have made myself more clear :)

You wouldn't want to count the number of words, you would want to count the string length (strlen). If it is the same string without the pipes, then you want to split it with substr after a certain amount.
$strCounts = array();
foreach ($arr as $item) {
$strCounts[] = strlen($item);
}
// Later on.
$arr = array();
$i = 0;
foreach ($strCounts as $count) {
$arr[] = substr($string, $i, $count);
$i += $count; // increment the start position by the length
}
I have not tested this, simply a "theory" and probably has some kinks to work out. There may be a better way to go about it, I just don't know it.

Interesting question, although I think the rope data structure still applies it might be a little overkill since word placement won't change. Here is my solution:
$str = "I have | been very busy lately and need to go | to bed early";
function get_breaks($str)
{
$breaks = array();
$arr = explode("|", $str);
foreach($arr as $val)
{
$breaks[] = str_word_count($val);
}
return $breaks;
}
$breaks = get_breaks($str);
echo "<pre>" . print_r($breaks, 1) . "</pre>";
$str = str_replace("|", "", $str);
function rebreak($str, $breaks)
{
$return = array();
$old_break = 0;
$arr = str_word_count($str, 1);
foreach($breaks as $break)
{
$return[] = implode(" ", array_slice($arr, $old_break, $break));
$old_break += $break;
}
return $return;
}
echo "<pre>" . print_r(rebreak($str, $breaks), 1) . "</pre>";
echo "<pre>" . print_r(rebreak("I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p", $breaks), 1) . "</pre>";
Let me know if you have any questions, but it is pretty self explanatory. There are definitely ways to improve this as well.

I'm not quite sure I understood what you actually wanted to achieve. But here are a couple of things that might help you:
str_word_count() counts the number of words in a string. preg_match_all('/\p{L}[\p{L}\p{Mn}\p{Pd}\x{2019}]*/u', $string, $foo); does pretty much the same, but on UTF-8 strings.
strpos() finds the first occurrence of a string within another. You could easily find the positions of all | with this:
$pos = -1;
$positions = array();
while (($pos = strpos($string, '|', $pos + 1)) !== false) {
$positions[] = $pos;
}
I'm still not sure I understood why you can't just use explode() for this, though.
<?php
$string = 'I have | been very busy lately and need to go | to bed early';
$parts = explode('|', $string);
$words = array();
foreach ($parts as $s) {
$words[] = str_word_count($s);
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Retrieving words with colon, and associated data - php

Related

Add keys in associative php array as per reference

php foreach group results by 3 sets of characters

similar substring in other string PHP

Full text search PHP alone

Split a string, remember the positions of splitting

Categories

Resources