I want to parse a file, and store it into an Array in PHP. However, there are some rules which should the observed:
(p="value") should be ignored, but the "value" should be preserved.
- should be ignored.
whitespaces should be ignored.
split by \t and \n.
A sample string is :
NPD4196-2a_5_0
Geldanamycin - 0.166516 (p = 0.0068) Alamethicin - 0.158302 (p = 0.0206) 4-Hydroxytamoxifen - 0.1429 (p = 0.0183) Abietic acid - 0.133045 (p = 0.0203) Caspofungin - 0.130885 (p = 0.0432) Extract 00-303C - 0.12858 (p = 0.0356) U73122 - 0.113274 (p = 0.0482) Radicicol - 0.10213 (p = 0.0356) Calcium ionophore - 0.096183 (p = 0.0262)
So, the goal is to produce a data structure like:
Array('NPD4196-2a_5_0' => Array(Array( 0 => 'Geldanamycin', 1 => '0.166516', 2 => '0.0068'), Array( ... ));
I have this written so far ...
while(($line = fgets($fp)) !== false){
$args = preg_split( '/[\t\n (=) ]+/', $line, -1, PREG_SPLIT_NO_EMPTY );
if(count($args)){
print_r($args);
print "\n";
}
}
What am I missing in other to accomplish my goal?
Thanks
(.+?)-\s*([\d\.]+)\s*\(p\s*=\s*([\d\.]+)\)
That will grab the element (e.g. Geldanamycin) in group 1, the related value in group 2, and the p value in group 3.
Play with the regex here.
This seems to work for one key-value pair (assuming NPD4196-2a_5_0 is the key in your example, and the second line is the value).
<?php
$fp = fopen('foo.txt', 'r');
$regex = '/(\w*)\s*-\s*([\d\.]+)\s*\(p\s*=\s*([\d\.]+)\)/';
$id = "NO ID";
$result = Array();
while(($line = fgets($fp)) !== false){
if (!preg_match($regex, $line)) {
$id = chop($line);
} else {
$all = Array();
while (preg_match($regex, $line, $matches, PREG_OFFSET_CAPTURE)) {
$last = end($matches);
$line = substr($line, $last[1] + strlen($last[0]) + 1);
$strings = Array();
for ($i = 1; $i < 4; $i++) {
array_push($strings, $matches[$i][0]);
}
array_push($all, $strings);
}
$result[$id] = $all;
}
}
print_r($result);
?>
(That is a slightly edited version of David B's regex.)
If the line doesn't match that long RegEx pattern, it will store the line as the ID. Otherwise, it will match the RegEx, then chop off the matching part. Each iteration of the inner while loop will match one entry. Since I am grabbing the indices of the matches, the for loop is used to only add the strings to the result.
This prints:
Array
(
[NPD4196-2a_5_0] => Array
(
[0] => Array
(
[0] => Geldanamycin
[1] => 0.166516
[2] => 0.0068
)
[1] => Array
(
[0] => Alamethicin
[1] => 0.158302
[2] => 0.0206
)
[2] => Array
(
[0] => Hydroxytamoxifen
[1] => 0.1429
[2] => 0.0183
)
...
Related
$string = "The complete archive of The New York Times can now be searched from NYTimes.com " //the actual input is unknown, it would be read from textarea
$size = the longest word length from the string
I assigned and initialized array in for loop, for example array1, array2 ....arrayN, here is how i did
for ($i = 1; $i <= $size; $i++) {
${"array" . $i} = array();
}
so the $string would be divided in the length of the word
$array1 = [""];
$array2 = ["of", "be", ...]
$array3 = ["the", "can", "now", ...] and so on
So, my question is how to assign in simple for loop or foreach loop $string value to $array1, $array2, $array3 ....., since the input text or the size of the longest word is unknown
I'd probably start with $words = explode(' ', $string)
then sort the string by word length
usort($words, function($word1, $word2) {
if (strlen($word1) == strlen($word2)) {
return 0;
}
return (strlen($word1) < strlen($word2)) ? -1 : 1;
});
$longestWordSize = strlen(last($words));
Loop over the words and place in their respective buckets.
Rather than separate variables for each length array, you should consider something like
$sortedWords = array(
1 => array('a', 'I'),
2 => array('to', 'be', 'or', 'is'),
3 => array('not', 'the'),
);
by looping over the words you don't need to know the maximum word length.
The final solution is as simple as
foreach ($words as $word) {
$wordLength = strlen($word);
$sortedWords[ $wordLength ][] = $word;
}
You could use something like this:
$words = explode(" ", $string);
foreach ($words as $w) {
array_push(${"array" . strlen($w)}, $w);
}
This splits up $string into an array of $words and then evaluates each word for length and pushes that word to the appropriate array.
you can use explode().
$string = "The complete archive of The New York Times can now be searched from NYTimes.com " ;
$arr=explode(" ",$string);
$count=count($arr);
$big=0;
for ($i = 0; $i < $count; $i++) {
$p=strlen($arr[$i]);
if($big<$p){ $big_val=$arr[$i]; $big=$p;}
}
echo $big_val;
Just use the word length as the index and append [] each word:
foreach(explode(' ', $string) as $word) {
$array[strlen($word)][] = $word;
}
To remove duplicates $array = array_map('array_unique', $array);.
Yields:
Array
(
[3] => Array
(
[0] => The
[2] => New
[3] => can
[4] => now
)
[8] => Array
(
[0] => complete
[1] => searched
)
[7] => Array
(
[0] => archive
)
[2] => Array
(
[0] => of
[1] => be
)
[4] => Array
(
[0] => York
)
[5] => Array
(
[0] => Times
)
)
If you want to re-index the main array use array_values() and to re-index the subarrays use array_map() with array_values().
I have array symbols what I want replace, but I need generate all possibillity
$lt = array(
'a' => 'ą',
'e' => 'ę',
'i' => 'į',
);
For example if I have this string:
tazeki
There can be huge amount of results:
tązeki
tazęki
tązęki
tazekį
tązekį
tazękį
tązękį
My question is what formula use to have all variants ?
This should work for you, easy and simple:
What does this code do?
1. Data part
In the data part I just define the string and the replacement's for the single character with a associative array (search character as key, replacement as value).
2. getReplacements() function
This function get's all combinations of the characters which have to be replaced in this format:
key = index in the string
value = character
So in this code example the array would look something like this:
Array (
[0] => Array (
[1] => a
)
[1] => Array (
[3] => e
)
[2] => Array (
[3] => e
[1] => a
)
[3] => Array (
[5] => i
)
[4] => Array (
[5] => i
[1] => a
)
[5] => Array (
[5] => i
[3] => e
)
[6] => Array (
[5] => i
[3] => e
[1] => a
)
)
As you can see this array holds all combinations of the characters which have to be replaced, in this format:
[0] => Array (
//^^^^^ The entire sub array is the combination which holds the single characters which will be replaced
[1] => a
//^ ^ A single character of the full combination which will be replaced
//| The index of the character in the string (This is that it also works if you have a character multiple times in your string)
// e.g. 1 -> t *a* z e k i
// ^ ^ ^ ^ ^ ^
// | | | | | |
// 0 *1* 2 3 4 5
)
So how does it gets all combinations?
Pretty simple I loop through every single character which I want to replace with a foreach loop and then I go through every single combination which I already have and combine it with the character which is currently the value of the foreach loop.
But to get this to work you have to start with a empty array. So as a simple example to see and understand what I mean:
Characters which have to be replaced (Empty array is '[]'): [1, 2, 3]
//new combinations for the next iteration
|
Character loop for NAN*:
Combinations:
- [] | -> []
Character loop for 1:
Combinations:
- [] + 1 | -> [1]
Character loop for 2:
Combinations:
- [] + 2 | -> [2]
- [1] + 2 | -> [1,2]
Character loop for 3:
Combinations:
- [] + 3 | -> [3]
- [1] + 3 | -> [1,3]
- [2] + 3 | -> [2,3]
- [1,2] + 3 | -> [1,2,3]
//^ All combinations here
* NAN: not a number
So as you can see there is always: (2^n)-1 combinations in total. Also from this method there is a empty array left in the combination array, so before I return the array I just use array_filter() to remove all empty arrays and array_values() to reindex the entire array.
3. Replacement part
So to get all characters from the string where will build the combinations out of I use this line:
array_intersect(str_split($str), array_keys($replace))
This just get's all coincidences with array_intersect() from the string as array with str_split() and the keys from the replace array with array_keys().
In this code the array which you pass to the getReplacements() function would look something like this:
Array
(
[1] => a
//^ ^ The single character which is in the string and also in the replace array
//| Index in the string from the character
[3] => e
[5] => i
)
4. Replace all combinations
At the end you only have to replace all combinations in the source string with the replace array. For this I loop just through every combination and replace every single character in the string from the combination with the matching character from the replace array.
This can be simply done with this line:
$tmp = substr_replace($tmp, $replace[$v], $k, 1);
//^^^^^^^^^^^^^^ ^^^^^^^^^^^^ ^^ ^ Length of the replacement
//| | | Index from the string, where it should replace
//| | Get the replaced character to replace it
//| Replaces every single character one by one in the string
For more information about substr_replace() see the manual: http://php.net/manual/en/function.substr-replace.php
After this line you just add the replaced string in the result array and rest the string to the source string again.
Code:
<?php
//data
$str = "tazeki";
$replace = array(
'a' => 'ą',
'e' => 'ę',
'i' => 'į',
);
function getReplacements($array) {
//initalize array
$results = [[]];
//get all combinations
foreach ($array as $k => $element) {
foreach ($results as $combination)
$results[] = [$k => $element] + $combination;
}
//return filtered array
return array_values(array_filter($results));
}
//get all combinations to replace
$combinations = getReplacements(array_intersect(str_split($str), array_keys($replace)));
//replace all combinations
foreach($combinations as $word) {
$tmp = $str;
foreach($word as $k => $v)
$tmp = substr_replace($tmp, $replace[$v], $k, 1);
$result[] = $tmp;
}
//print data
print_r($result);
?>
Output:
Array
(
[0] => tązeki
[1] => tazęki
[2] => tązęki
[3] => tazekį
[4] => tązekį
[5] => tazękį
[6] => tązękį
)
Here is a solution particularly for your task. You can pass any word and any array for replacements, it should work.
<?php
function getCombinations($word, $charsReplace)
{
$charsToSplit = array_keys($charsReplace);
$pattern = '/('.implode('|', $charsToSplit).')/';
// split whole word into parts by replacing symbols
$parts = preg_split($pattern, $word, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$replaceParts = array();
$placeholder = '';
// create string with placeholders (%s) for sptrinf and array of replacing symbols
foreach ($parts as $wordPart) {
if (isset($charsReplace[$wordPart])) {
$replaceParts[] = $wordPart;
$placeholder .= '%s';
} else {
$placeholder .= $wordPart;
}
}
$paramsCnt = count($replaceParts);
$combinations = array();
$combinationsCnt = pow(2, $paramsCnt);
// iterate all combinations (with help of binary codes)
for ($i = 0; $i < $combinationsCnt; $i++) {
$mask = sprintf('%0'.$paramsCnt.'b', $i);
$sprintfParams = array($placeholder);
foreach ($replaceParts as $index => $char) {
$sprintfParams[] = $mask[$index] == 1 ? $charsReplace[$char] : $char;
}
// fill current combination into placeholder and collect it in array
$combinations[] = call_user_func_array('sprintf', $sprintfParams);
}
return $combinations;
}
$lt = array(
'a' => 'ą',
'e' => 'ę',
'i' => 'į',
);
$word = 'stazeki';
$combinations = getCombinations($word, $lt);
print_r($combinations);
// Оutput:
// Array
// (
// [0] => stazeki
// [1] => stazekį
// [2] => stazęki
// [3] => stazękį
// [4] => stązeki
// [5] => stązekį
// [6] => stązęki
// [7] => stązękį
// )
This is an implementation in PHP :
<?php
/**
* String variant generator
*/
class stringVariantGenerator
{
/**
* Contains assoc of char => array of all its variations
* #var array
*/
protected $_mapping = array();
/**
* Class constructor
*
* #param array $mapping Assoc array of char => array of all its variation
*/
public function __construct(array $mapping = array())
{
$this->_mapping = $mapping;
}
/**
* Generate all variations
*
* #param string $string String to generate variations from
*
* #return array Assoc containing variations
*/
public function generate($string)
{
return array_unique($this->parseString($string));
}
/**
* Parse a string and returns variations
*
* #param string $string String to parse
* #param int $position Current position analyzed in the string
* #param array $result Assoc containing all variations
*
* #return array Assoc containing variations
*/
protected function parseString($string, $position = 0, array &$result = array())
{
if ($position <= strlen($string) - 1)
{
if (isset($this->_mapping[$string{$position}]))
{
foreach ($this->_mapping[$string{$position}] as $translatedChar)
{
$string{$position} = $translatedChar;
$this->parseString($string, $position + 1, $result);
}
}
else
{
$this->parseString($string, $position + 1, $result);
}
}
else
{
$result[] = $string;
}
return $result;
}
}
// This is where you define what are the possible variations for each char
$mapping = array(
'e' => array('#', '_'),
'p' => array('*'),
);
$word = 'Apple love!';
$generator = new stringVariantGenerator($mapping);
print_r($generator->generate($word));
It would return :
Array
(
[0] => A**l# lov#!
[1] => A**l# lov_!
[2] => A**l_ lov#!
[3] => A**l_ lov_!
)
In your case, if you want to use the letter itself as a valid translated value, just add it into the array.
$lt = array(
'a' => array('a', 'ą'),
'e' => array('e', 'ę'),
'i' => array('i', 'į'),
);
I'm not sure if you can do this with keys and value but as two arrays definatley.
$find = array('ą','ę','į');
$replace = array('a', 'e', 'i');
$string = 'tązekį';
echo str_replace($find, $replace, $string);
I'm not sure If I understand your question, but here is my answer :-)
$word = 'taxeki';
$word_arr = array();
$word_arr[] = $word;
//Loop through the $lt-array where $key represents what char to search for
//$letter what to replace with
//
foreach($lt as $key=>$letter) {
//Loop through each char in the $word-string
for( $i = 0; $i <= strlen($word)-1; $i++ ) {
$char = substr( $word, $i, 1 );
//If current letter in word is same as $key from $lt-array
//then add a word the $word_arr where letter is replace with
//$letter from the $lt-array
if ($char === $key) {
$word_arr[] = str_replace($char, $letter, $word);
}
}
}
var_dump($word_arr);
I'm assuming you have a known number of elements in your array, and I am assuming that that number is 3. You will have to have additional loops if you have additional elements in your $lt array.
$lt = array(
'a' => array('a', 'x'),
'e' => array('e', 'x'),
'i' => array('i', 'x')
);
$str = 'tazeki';
foreach ($lt['a'] as $a)
foreach ($lt['e'] as $b)
foreach ($lt['i'] as $c) {
$newstr = str_replace(array_keys($lt), array($a, $b, $c), $str);
echo "$newstr<br />\n";
}
If the number of elements in $lt is unknown or variable then this is not a good solution.
Well, though #Rizier123 and others have already provided good answers will clear explanations, I would like to leave my contribution as well. This time, honoring the Way of the Short Source Code over readability ... ;-)
$lt = array('a' => 'ą', 'e' => 'ę', 'i' => 'į');
$word = 'tazeki';
for ($i = 0; $i < strlen($word); $i++)
$lt[$word[$i]] && $r[pow(2, $u++)] = [$lt[$word[$i]], $i];
for ($i = 1; $i < pow(2, count($r)); $i++) {
for ($w = $word, $u = end(array_keys($r)); $u > 0; $u >>= 1)
($i & $u) && $w = substr_replace($w, $r[$u][0], $r[$u][1], 1);
$res[] = $w;
}
print_r($res);
Output:
Array
(
[0] => tązeki
[1] => tazęki
[2] => tązęki
[3] => tazekį
[4] => tązekį
[5] => tazękį
[6] => tązękį
)
This is slightly different to finding all the positions of a substring inside a string because I want it to work with words which may be followed by a space, comma, semi-colon, colon, fullstop, exclamation mark and other punctuation.
I have the following function to find all the positions of a substring:
function strallpos($haystack,$needle,$offset = 0){
$result = array();
for($i = $offset; $i<strlen($haystack); $i++){
$pos = strpos($haystack,$needle,$i);
if($pos !== FALSE){
$offset = $pos;
if($offset >= $i){
$i = $offset;
$result[] = $offset;
}
}
}
return $result;
}
Problem is, if I try to find all positions of the substring "us", it will return positions of the occurrence in "prospectus" or "inclusive" etc..
Is there any way to prevent this? Possibly using regular expressions?
Thanks.
Stefan
You can capture offset with preg_match_all:
$str = "Problem is, if I try to find all positions of the substring us, it will return positions of the occurrence in prospectus or inclusive us us";
preg_match_all('/\bus\b/', $str, $m, PREG_OFFSET_CAPTURE);
print_r($m);
output:
Array
(
[0] => Array
(
[0] => Array
(
[0] => us
[1] => 60
)
[1] => Array
(
[0] => us
[1] => 134
)
[2] => Array
(
[0] => us
[1] => 137
)
)
)
Just to demonstrate a non regexp alternative
$string = "It behooves us all to offer the prospectus for our inclusive syllabus";
$filterword = 'us';
$filtered = array_filter(
str_word_count($string,2),
function($word) use($filterword) {
return $word == $filterword;
}
);
var_dump($filtered);
where the keys of $filtered are the offset position
If you want case-insensitive, replace
return $word == $filterword;
with
return strtolower($word) == strtolower($filterword);
I have a string with custom markup for saving songs with chords, tabulatures, notes etc. It contains
things in various brackets: \[.+?\], \[[.+?\]], \(.+?\)
arrows: <-{3,}>, \-{3,}>, <\-{3,}
and so on...
Sample text might be
Text Text [something]
--->
Text (something 021213)
Now I wish to parse the markup into array of tokens, objects of corresponding classes, which would look like (matched parts in brackets)
ParsedBlock_Text ("Text Text ")
ParsedBlock_Chord ("something")
ParsedBlock_Text (" ")
ParsedBlock_NewColumn
ParsedBlock_Text (" text ")
ParsedBlock_ChordDiagram ("something 021213")
I know how to match them, but either I must match each different pattern, and save offsets to properly sort the array, or I match them at once and I don't know which one has been matched.
Thanks, MK
Assuming you do not try to nest these structures, this will tokenize your text:
function ParseText($text) {
$re = '/\[\[(?P<DoubleBracket>.*?)]]|\[(?P<Bracket>.*?)]|\((?P<Paren>.*?)\)|(?<Arrow><---+>?|---+>)/s';
$keys = array('DoubleBracket', 'Bracket', 'Paren', 'Arrow');
$result = array();
$lastStart = 0;
if (preg_match_all($re, $text, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE)) {
foreach ($matches as $match) {
$start = $match[0][1];
$prefix = substr($text, $lastStart, $start - $lastStart);
$lastStart = $start + strlen($match[0][0]);
if ($prefix != '' && !ctype_space($prefix)) {
$result []= array('Text', trim($prefix));
}
foreach ($keys as $key) {
if (isset($match[$key]) && $match[$key][1] >= 0) {
$result []= array($key, $match[$key][0]);
break;
}
}
}
}
$prefix = substr($text, $lastStart);
if ($prefix != '' && !ctype_space($prefix)) {
$result []= array('Text', trim($prefix));
}
return $result;
}
Example:
$mytext = <<<'EOT'
Text Text [something]
--->
Text (something 021213)
More Text
EOT;
$parsed = ParseText($mytext);
foreach ($parsed as $item) {
print_r($item);
}
Output:
Array
(
[0] => Text
[1] => Text Text
)
Array
(
[0] => Bracket
[1] => something
)
Array
(
[0] => Arrow
[1] => --->
)
Array
(
[0] => Text
[1] => Text
)
Array
(
[0] => Paren
[1] => something 021213
)
Array
(
[0] => Text
[1] => More Text
)
http://ideone.com/kJQrBw
If you want to add more patterns to the regex, make sure you put longer patterns at the start, so they are not mistakenly matched as the wrong type.
I previously had some help with this matter from #HSZ but have had trouble getting the solution to work with an existing array. What im trying to do is explode quotes, make all words uppercase plus get each words index value and only echo the ones i define then implode. In simplest terms, always remove the 4th word within quotations or the 3rd and 4th... This could probably done with regex as well.
Example:
Hello [1] => World [2] => This [3] => Is [4] => a [5] => Test ) 6 only outputs the numbers i define, such as 1 - (Hello) and [2] - (World) leaving out [3], [4], [5] and [6] or This is a test leaving only Hello World, or 1 and [6] for Hello Test...
Such as:
echo $data[1] + ' ' + $data[6]; //Would output index 1 and 6 Hello and Test
Existing Code
if (stripos($data, 'test') !== false) {
$arr = explode('"', $data);
for ($i = 1; $i < count($arr); $i += 2) {
$arr[$i] = strtoupper($arr[$i]);
$arr[$i] = str_word_count($arr, 1); // Doesnt work with array of course.
}
$arr = $matches[1] + ' ' + $matches[6];
$data = implode('"', $arr);
}
Assuming $data is 'Hello world "this is a test"', $arr = explode('"', $data) will be the same as:
$arr = array (
[0] => 'Hello World',
[1] => 'This is a test'
)
If you want to do things with this is a test, you can explode it out using something like $testarr = explode(' ', $arr[1]);.
You can then do something like:
$matches = array();
foreach ($testarr as $key => $value) {
$value = strtoupper($value);
if(($key+1)%4 == 0) { // % is modulus; the result is the remainder the division of two numbers. If it's 0, the key+1 (compensate for 0 based keys) is divisible by 4.
$matches[] = $value;
}
}
$matches = implode('"',$matches);