How to parse heterogenous markup with PHP?

How to parse heterogenous markup with PHP? - php

I have a string with custom markup for saving songs with chords, tabulatures, notes etc. It contains
things in various brackets: \[.+?\], \[[.+?\]], \(.+?\)
arrows: <-{3,}>, \-{3,}>, <\-{3,}
and so on...
Sample text might be
Text Text [something]
--->
Text (something 021213)
Now I wish to parse the markup into array of tokens, objects of corresponding classes, which would look like (matched parts in brackets)
ParsedBlock_Text ("Text Text ")
ParsedBlock_Chord ("something")
ParsedBlock_Text (" ")
ParsedBlock_NewColumn
ParsedBlock_Text (" text ")
ParsedBlock_ChordDiagram ("something 021213")
I know how to match them, but either I must match each different pattern, and save offsets to properly sort the array, or I match them at once and I don't know which one has been matched.
Thanks, MK

Assuming you do not try to nest these structures, this will tokenize your text:
function ParseText($text) {
$re = '/\[\[(?P<DoubleBracket>.*?)]]|\[(?P<Bracket>.*?)]|\((?P<Paren>.*?)\)|(?<Arrow><---+>?|---+>)/s';
$keys = array('DoubleBracket', 'Bracket', 'Paren', 'Arrow');
$result = array();
$lastStart = 0;
if (preg_match_all($re, $text, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE)) {
foreach ($matches as $match) {
$start = $match[0][1];
$prefix = substr($text, $lastStart, $start - $lastStart);
$lastStart = $start + strlen($match[0][0]);
if ($prefix != '' && !ctype_space($prefix)) {
$result []= array('Text', trim($prefix));
}
foreach ($keys as $key) {
if (isset($match[$key]) && $match[$key][1] >= 0) {
$result []= array($key, $match[$key][0]);
break;
}
}
}
}
$prefix = substr($text, $lastStart);
if ($prefix != '' && !ctype_space($prefix)) {
$result []= array('Text', trim($prefix));
}
return $result;
}
Example:
$mytext = <<<'EOT'
Text Text [something]
--->
Text (something 021213)
More Text
EOT;
$parsed = ParseText($mytext);
foreach ($parsed as $item) {
print_r($item);
}
Output:
Array
(
[0] => Text
[1] => Text Text
)
Array
(
[0] => Bracket
[1] => something
)
Array
(
[0] => Arrow
[1] => --->
)
Array
(
[0] => Text
[1] => Text
)
Array
(
[0] => Paren
[1] => something 021213
)
Array
(
[0] => Text
[1] => More Text
)
http://ideone.com/kJQrBw
If you want to add more patterns to the regex, make sure you put longer patterns at the start, so they are not mistakenly matched as the wrong type.

Related

PHP regex to match key value pairs from a given string

i hope someone can help.
I have a string as following
$string = 'latitude=46.6781471,longitude=13.9709534,options=[units=auto,lang=de,exclude=[hourly,minutely]]';
Now what i am trying is to create an array out of each key, value pair but badly failing with regex for preg_match_all()
Currently my attempts aren't giving desired results, creating key => value pairs works as long as there are no brackets, but i have absolutely no idea how to achieve a multidimensional array if key contains key/value pairs inside brackets in example.
Array (
[0] => Array
(
[0] => latitude=46.6781471,
[1] => longitude=13.9709534,
[2] => options=[units=si,
[3] => lang=de,
)
[1] => Array
(
[0] => latitude
[1] => longitude
[2] => options=[units
[3] => lang
)
.. and so on
Where in the end i would like to achieve results as following.
Array (
[latitude] => 46.6781471
[longitude] => 13.9709534
[options] => Array
(
[units] => auto
[exclude] => hourly,minutely
)
)
I would appreciate any help or example how i can achieve this from a given string.

Regular expression isn't the right tool to deal with recursive matches. You can write a parser instead of a regex (or use JSON, query string, XML or any other commonly used format):
function parseOptionsString($string) {
$length = strlen($string);
$key = null;
$contextStack = array();
$options = array();
$specialTokens = array('[', ']', '=', ',');
$buffer = '';
$currentOptions = $options;
for ($i = 0; $i < $length; $i++) {
$currentChar = $string[$i];
if (!in_array($currentChar, $specialTokens)) {
$buffer .= $currentChar;
continue;
}
if ($currentChar == '[') {
array_push($contextStack, [$key, $currentOptions]);
$currentOptions[$key] = array();
$currentOptions = $currentOptions[$key];
$key = '';
$buffer = '';
continue;
}
if ($currentChar == ']') {
if (!empty($buffer)) {
if (!empty($key)) {
$currentOptions[$key] = $buffer;
} else {
$currentOptions[] = $buffer;
}
}
$contextInfo = array_pop($contextStack);
$previousContext = $contextInfo[1];
$thisKey = $contextInfo[0];
$previousContext[$thisKey] = $currentOptions;
$currentOptions = $previousContext;
$buffer = '';
$key = '';
continue;
}
if ($currentChar == '=') {
$key = $buffer;
$buffer = '';
continue;
}
if ($currentChar == ',') {
if (!empty($key)) {
$currentOptions[$key] = $buffer;
} else if (!empty($buffer)) {
$currentOptions[] = $buffer;
}
$buffer = '';
$key = '';
continue;
}
}
if (!empty($key)) {
$currentOptions[$key] = $buffer;
}
return $currentOptions;
}
this gives the following output:
print_r(parseOptionsString($string));
Array
(
[latitude] => 46.6781471
[longitude] => 13.9709534
[options] => Array
(
[units] => auto
[lang] => de
[exclude] => Array
(
[0] => hourly
[1] => minutely
)
)
)
Note also that you want a special syntax for arrays with only comma separated values (exclude=[hourly,minutely] becomes exclude => hourly,minutely and not exclude => array(hourly, minutely)). I think this is an inconsistency in your format and I wrote the parser with the "correct" version in mind.

If you don't want parser you can also try this code. It converts your string to JSON and decode to array. But as others said, I think you should try the approach with JSON. If you're sending this string by XmlHttpRequest in JavaScript it will not be hard to create proper JSON code to send.
$string = 'latitude=46.6781471,longitude=13.9709534,options=[units=auto,lang=de,exclude=[hourly,minutely]]';
$string = preg_replace('/([^=,\[\]\s]+)/', '"$1"', $string);
$string = '{' . $string . '}';
$string = str_replace('=', ':', $string);
$string = str_replace('[', '{', $string);
$string = str_replace(']', '}', $string);
$string = preg_replace('/({[^:}]*})/', '|$1|', $string);
$string = str_replace('|{', '[', $string);
$string = str_replace('}|', ']', $string);
$result = json_decode($string, true);
print_r($result);

PHP read Netlist from txt file

I have this file format of txt file generated from schematic software:
(
NETR5_2
R6,1
R5,2
)
(
NETR1_2
R4,2
R3,1
R3,2
R2,1
R2,2
R1,1
R1,2
)
I need to get this:
Array
(
[0] => Array
(
[0] => NETR5_2
[1] => R6,1
[2] => R5,2
)
[1] => Array
[0] => NETR1_2
[1] => R4,2
[2] => R3,1
[3] => R3,2
[4] => R2,1
[5] => R2,2
[6] => R1,1
[7] => R1,2
)
Here is code i try but i get all from input string:
$file = file('tangoLista.txt');
/* GET - num of lines */
$f = fopen('tangoLista.txt', 'rb');
$lines = 0;
while (!feof($f)) {
$lines += substr_count(fread($f, 8192), "\n");
}
fclose($f);
for ($i=0;$i<=$lines;$i++) {
/* RESISTORS - check */
if (strpos($file[$i-1], '(') !== false && strpos($file[$i], 'NETR') !== false) {
/* GET - id */
for($k=0;$k<=10;$k++) {
if (strpos($file[$i+$k], ')') !== false || empty($file[$i+$k])) {
} else {
$json .= $k.' => '.$file[$i+$k];
}
}
$resistors_netlist[] = array($json);
}
}
echo '<pre>';
print_r($resistors_netlist);
echo '</pre>';
I need to read between ( and ) and put into array values...i try using checking if line begins with ( and NETR and if yes put into array...but i don't know how to get number if items between ( and ) to get foreach loop to read values and put into array.
Where i im making mistake? Can code be shorter?

Try this approach:
<?php
$f = fopen('test.txt', 'rb');
$resistors_netlist = array();
$current_index = 0;
while (!feof($f)) {
$line = trim(fgets($f));
if (empty($line)) {
continue;
}
if (strpos($line, '(') !== false) {
$resistors_netlist[$current_index] = array();
continue;
}
if (strpos($line, ')') !== false) {
$current_index++;
continue;
}
array_push($resistors_netlist[$current_index], $line);
}
fclose($f);
print_r($resistors_netlist);
This gives me:
Array
(
[0] => Array
(
[0] => NETR5_2
[1] => R6,1
[2] => R5,2
)
[1] => Array
(
[0] => NETR1_2
[1] => R4,2
[2] => R3,1
[3] => R3,2
[4] => R2,1
[5] => R2,2
[6] => R1,1
[7] => R1,2
)
)
We start $current_index at 0. When we see a (, we create a new sub-array at $resistors_netlist[$current_index]. When we see a ), we increment $current_index by 1. For any other line, we just append it to the end of $resistors_netlist[$current_index].

Try this, using preg_match_all:
$text = '(
NETR5_2
R6,1
R5,2
)
(
NETR1_2
R4,2
R3,1
R3,2
R2,1
R2,2
R1,1
R1,2
)';
$chunks = explode(")(", preg_replace('/\)\W+\(/m', ')(', $text));
$result = array();
$pattern = '{([A-z0-9,]+)}';
foreach ($chunks as $row) {
preg_match_all($pattern, $row, $matches);
$result[] = $matches[1];
}
print_r($result);
3v4l.org demo
I'm not the king of regex, so you can find a better way.
The main problem are parenthesis: I don't know what are between closing and next open parenthesis ( )????( ), so first I replace every space, tab, cr or ln between, then I explode the string by )(.
I perform a foreach loop for every element of resulted array, matching every occurrence of A-z0-9, and add array of retrieved values to an empty array that, at end of foreach, will contain desired result.
Please note:
The main pattern is based on provided example: if the values contains other characters then A-z 0-9 , the regex fails.
Edit:
Replaced preliminar regex pattern with `/\)\W+\(/m`

PHP Check string contain #(any) [duplicate]

I have a string that has hash tags in it and I'm trying to pull the tags out I think i'm pretty close but getting a multi-dimensional array with the same results
$string = "this is #a string with #some sweet #hash tags";
preg_match_all('/(?!\b)(#\w+\b)/',$string,$matches);
print_r($matches);
which yields
Array (
[0] => Array (
[0] => "#a"
[1] => "#some"
[2] => "#hash"
)
[1] => Array (
[0] => "#a"
[1] => "#some"
[2] => "#hash"
)
)
I just want one array with each word beginning with a hash tag.

this can be done by the /(?<!\w)#\w+/ regx it will work

That's what preg_match_all does. You always get a multidimensional array. [0] is the complete match and [1] the first capture groups result list.
Just access $matches[1] for the desired strings. (Your dump with the depicted extraneous Array ( [0] => Array ( [0] was incorrect. You get one subarray level.)

I think this function will help you:
echo get_hashtags($string);
function get_hashtags($string, $str = 1) {
preg_match_all('/#(\w+)/',$string,$matches);
$i = 0;
if ($str) {
foreach ($matches[1] as $match) {
$count = count($matches[1]);
$keywords .= "$match";
$i++;
if ($count > $i) $keywords .= ", ";
}
} else {
foreach ($matches[1] as $match) {
$keyword[] = $match;
}
$keywords = $keyword;
}
return $keywords;
}

Try:
$string = "this is #a string with #some sweet #hash tags";
preg_match_all('/(?<!\w)#\S+/', $string, $matches);
print_r($matches[0]);
echo("<br><br>");
// Output: Array ( [0] => #a [1] => #some [2] => #hash )

PHP - Searching for string and outputting into new string

Let's say I put the following into a textbox:
1|[Guangzhou Evergrande](//www.gzevergrandefc.com/)|6|5-1-0|+15|**16**
2|[Shandong Luneng](//www.lunengsports.com/)|7|5-1-1|+7|**16**
3|[Qingdao Jonoon](//www.zhongnengfc.com/)|7|4-3-0|+4|**15**
4|[Beijing Guoan](//www.fcguoan.com/)|7|3-3-1|+2|**12**
when I press enter, it would take what's between the [ and ] and ( and ) and put it into new lines like this:
if ($name == "NAME_HERE") $name = "[".$name."](URL_HERE)";
I tried doing preg_match and using this pattern: $pattern = '/^[/'; and $pattern_end = '/^]/'; for the name -- jsut to test -- but I cannot get it to work....
Here is what I have so far:
$string = '1|[Guangzhou Evergrande](//www.gzevergrandefc.com/)|6|5-1-0|+15|**16**';
$pattern = '/\[(.*?\)].*?\((.*?)\)/';
$replacement = 'if ($name == "{1}") $name = "[".$name."]({2})";';
echo preg_replace($pattern, $replacement, $string);

Your pattern should be
'/\[(.*?)\]\((.*?)\)/'
because [ ] and ( ) are special characters. Use preg_match_all() function
when you use preg_match_all you can put () for submatches(subpattern)
example
<?php
$string = '1|[Guangzhou Evergrande](//www.gzevergrandefc.com/)|6|5-1-0|+15|**16**';
$matches = array();
preg_match_all("/(<([\w]+)[^>]*>)(.*?)(<\/\\2>)/", $string, $matches);
var_dump($matches);
?>

Perhaps I don't fully understand what you are doing here, but, I see a data structure that can be exploded without a hassle. For example:
function get_football( $text ) {
$r = array();
$t = explode("\n", $text );
foreach( $t as $l ) {
$n = explode("|", $l);
$r[] = $n;
}
return( $r );
}
This will get you a nicely structured set of data that you can foreach() through and further process. If the original text is stored inside $variable0, print_r( get_football( $variable0 ) ); will show a nice structure:
Array
(
[0] => Array
(
[0] => 1
[1] => [Guangzhou Evergrande](//www.gzevergrandefc.com/)
[2] => 6
[3] => 5-1-0
[4] => +15
[5] => **16**
)
[1] => Array
Of course, that $n[1] name can be broken down further in the loop. Anyhow, thereafter, you can loop through whatever menu you're building with a foreach() loop instead of hardcoding menu choices. Just something to consider.

regular expression php to parse a file

I want to parse a file, and store it into an Array in PHP. However, there are some rules which should the observed:
(p="value") should be ignored, but the "value" should be preserved.
- should be ignored.
whitespaces should be ignored.
split by \t and \n.
A sample string is :
NPD4196-2a_5_0
Geldanamycin - 0.166516 (p = 0.0068) Alamethicin - 0.158302 (p = 0.0206) 4-Hydroxytamoxifen - 0.1429 (p = 0.0183) Abietic acid - 0.133045 (p = 0.0203) Caspofungin - 0.130885 (p = 0.0432) Extract 00-303C - 0.12858 (p = 0.0356) U73122 - 0.113274 (p = 0.0482) Radicicol - 0.10213 (p = 0.0356) Calcium ionophore - 0.096183 (p = 0.0262)
So, the goal is to produce a data structure like:
Array('NPD4196-2a_5_0' => Array(Array( 0 => 'Geldanamycin', 1 => '0.166516', 2 => '0.0068'), Array( ... ));
I have this written so far ...
while(($line = fgets($fp)) !== false){
$args = preg_split( '/[\t\n (=) ]+/', $line, -1, PREG_SPLIT_NO_EMPTY );
if(count($args)){
print_r($args);
print "\n";
}
}
What am I missing in other to accomplish my goal?
Thanks

(.+?)-\s*([\d\.]+)\s*\(p\s*=\s*([\d\.]+)\)
That will grab the element (e.g. Geldanamycin) in group 1, the related value in group 2, and the p value in group 3.
Play with the regex here.

This seems to work for one key-value pair (assuming NPD4196-2a_5_0 is the key in your example, and the second line is the value).
<?php
$fp = fopen('foo.txt', 'r');
$regex = '/(\w*)\s*-\s*([\d\.]+)\s*\(p\s*=\s*([\d\.]+)\)/';
$id = "NO ID";
$result = Array();
while(($line = fgets($fp)) !== false){
if (!preg_match($regex, $line)) {
$id = chop($line);
} else {
$all = Array();
while (preg_match($regex, $line, $matches, PREG_OFFSET_CAPTURE)) {
$last = end($matches);
$line = substr($line, $last[1] + strlen($last[0]) + 1);
$strings = Array();
for ($i = 1; $i < 4; $i++) {
array_push($strings, $matches[$i][0]);
}
array_push($all, $strings);
}
$result[$id] = $all;
}
}
print_r($result);
?>
(That is a slightly edited version of David B's regex.)
If the line doesn't match that long RegEx pattern, it will store the line as the ID. Otherwise, it will match the RegEx, then chop off the matching part. Each iteration of the inner while loop will match one entry. Since I am grabbing the indices of the matches, the for loop is used to only add the strings to the result.
This prints:
Array
(
[NPD4196-2a_5_0] => Array
(
[0] => Array
(
[0] => Geldanamycin
[1] => 0.166516
[2] => 0.0068
)
[1] => Array
(
[0] => Alamethicin
[1] => 0.158302
[2] => 0.0206
)
[2] => Array
(
[0] => Hydroxytamoxifen
[1] => 0.1429
[2] => 0.0183
)
...

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to parse heterogenous markup with PHP? - php

Related

PHP regex to match key value pairs from a given string

PHP read Netlist from txt file

PHP Check string contain #(any) [duplicate]

PHP - Searching for string and outputting into new string

regular expression php to parse a file

Categories

Resources