PHP Parse custom characters inside a string

PHP Parse custom characters inside a string - php

I need a help to parse the characters inside those brackets:
[]
{}
<>
{|}
<|>
For example, I have this string variable (Japanese):
$question = "この<部屋|へや>[に]{椅子|いす}[が]ありません";
Expected result in HTML:
Description
1) This is a particle. I will convert all word inside [] into HTML tag. Example: [に] will be converted into <span style="color:blue">に</span>. A full sentence can have multiple []. Note: I understand how to use str_replace.
2 and 4) This is normal kanji word which will be used as a question to the user. A full sentence can only have one {}.
3 and 5) This is normal kanji text. A full sentence can have multiple {}.
2, 3, 4, and 5) They will converted into Ruby html tags. Sometimes they will not have a | separator, which is not mandatory. From what I understand, I just need to explode the | characters. If explode return false or | not exist, I will use original value. Note: I understand how to use ruby tags (rb and rt).
My question
How do I parse characters 1-5 I mentioned above with PHP? What keyword I need to start?
Thanks.

Thanks to this page: Capturing text between square brackets in PHP, now I have my own answer.
Full code:
<?php
$text = "この<部屋|へや>[に]{椅子|いす}[が]ありません";
preg_match_all("/\[([^\]]*)\]/", $text, $square_brackets); //[]
preg_match_all("/{([^}]*)}/", $text, $curly_brackets); //{}
preg_match_all("/<([^}]*)>/", $text, $angle_brackets); //<>
print_r($square_brackets);
echo "\r\n";
print_r($curly_brackets);
echo "\r\n";
print_r($angle_brackets);
echo "\r\n";
Result:
Array
(
[0] => Array
(
[0] => [に]
[1] => [が]
)
[1] => Array
(
[0] => に
[1] => が
)
)
Array
(
[0] => Array
(
[0] => {椅子|いす}
)
[1] => Array
(
[0] => 椅子|いす
)
)
Array
(
[0] => Array
(
[0] => <部屋|へや>
)
[1] => Array
(
[0] => 部屋|へや
)
)
Thanks.

Related

PHP/Regex: Parse JSON in string

I'm trying to find simple key-value-pairs in strings, given as JSON-objects, while using preg_replace_callback().
Unfortunately, the values given can be of type string, number, boolean, null, array - and worst of all - objects. My own attempts solving this problem resulted in either an incomplete selection or over-selecting multiple JSON occurances as one.
Here the things i tried:
String:
text text {"key":{"key":"value"}} text
Regex:
\{"(.+?)"\:(.+?)\}
Match:
{"key":"value"
Above: This ignores the inner }-bracket
String:
text text {"key":{"key":"value"}} text
Regex:
\{"(.+?)"\:(.+)\}
Match:
{"key":"value"}
Above: This would (theoretically) work, but when having multiple JSON occurances, i get:
{"key":"value"}} {"key":{"key":"value"}
Next attempt:
String:
text text {"key":{"key":"value"}} {"key":{"key":"value"}} text
Regex:
\{"(.+?)"\:(?:(\{(?:.+?)\})|(?:")?(.+?)(?:")?)\}
Match:
{"key":"value"}
Above: Again, that would theoreticcally work. But when taking, for example, the following string:
text text {"key":{"key":{"key":"value"}}} text
The result is...
{"key":{"key":"value"}
Missing one bracket

PCRE supports recursive matching for that kind of nested structures. Here is a demo:
$data = 'text text
{"key":{"key":"value{1}","key2":false}}
{"key":{"key":"value2"}}
{"key":{"key":{"key":"value3"}}} text';
$pattern = '(
\{ # JSON object start
(
\s*
"[^"]+" # key
\s*:\s* # colon
(
# value
(?:
"[^"]+" | # string
\d+(?:\.\d+)? | # number
true |
false |
null
) |
(?R) # pattern recursion
)
\s*
,? # comma
)*
\} # JSON object end
)x';
preg_replace_callback(
$pattern,
function ($match) {
var_dump(json_decode($match[0]));
},
$data
);

With the additional requirements of using preg_replace_callback() and not knowing the depth of the json objects ahead of time, perhaps this is another possible approach (more information on {1,} here):
<?php
// ref: https://stackoverflow.com/q/66379119/1167750
$str = 'text text {"key":{"key":"value1"}} {"key":{"key":"value2"}} {"key":{"key":{"key":"value3"}}} text';
function callback($array) {
// Your function here...
print_r($array);
echo "Found:\n";
echo "{$array[0]}\n";
}
preg_replace_callback('/\{"(.+?)"\:(.+?)\}{1,}/', 'callback', $str);
?>
Output (PHP 7.3.19):
$ php q18.php
Array
(
[0] => {"key":{"key":"value1"}}
[1] => key
[2] => {"key":"value1"
)
Found:
{"key":{"key":"value1"}}
Array
(
[0] => {"key":{"key":"value2"}}
[1] => key
[2] => {"key":"value2"
)
Found:
{"key":{"key":"value2"}}
Array
(
[0] => {"key":{"key":{"key":"value3"}}}
[1] => key
[2] => {"key":{"key":"value3"
)
Found:
{"key":{"key":{"key":"value3"}}}
Previous idea:
Would something like this be helpful for your use case(s)?
<?php
// ref: https://stackoverflow.com/q/66379119/1167750
$str = 'text text {"key":{"key":"value1"}} {"key":{"key":"value2"}} {"key":{"key":{"key":"value3"}}} text';
preg_match_all('/\{"(.+?)"\:(.+?)\}{1,3}/', $str, $matches);
print_r($matches);
echo "Found:\n";
print_r($matches[0]);
?>
Output (PHP 7.3.19):
$ php q18.php
Array
(
[0] => Array
(
[0] => {"key":{"key":"value1"}}
[1] => {"key":{"key":"value2"}}
[2] => {"key":{"key":{"key":"value3"}}}
)
[1] => Array
(
[0] => key
[1] => key
[2] => key
)
[2] => Array
(
[0] => {"key":"value1"
[1] => {"key":"value2"
[2] => {"key":{"key":"value3"
)
)
Found:
Array
(
[0] => {"key":{"key":"value1"}}
[1] => {"key":{"key":"value2"}}
[2] => {"key":{"key":{"key":"value3"}}}
)
If you knew ahead of time that the maximum depth these nested structures might be, you can adjust the {1,3} part ahead of time to a different setting. For example: {1,4}, {1,5}, etc. More information on that part can be found in the documentation here.

How to get a particular string using preg_replace?

i want to get a particular value from string in php. Following is the string
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_replace('/(.*)\[(.*)\](.*)\[(.*)\](.*)/', '$2', $str);
i want to get value of data01. i mean [1,2].
How can i achieve this using preg_replace?
How can solve this ?

preg_replace() is the wrong tool, I have used preg_match_all() in case you need that other item later and trimmed down your regex to capture the part of the string you are looking for.
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('/\[([0-9,]+)\]/',$string,$match);
print_r($match);
/*
print_r($match) output:
Array
(
[0] => Array
(
[0] => [1,2]
[1] => [2,3]
)
[1] => Array
(
[0] => 1,2
[1] => 2,3
)
)
*/
echo "Your match: " . $match[1][0];
?>
This enables you to have the captured characters or the matched pattern , so you can have [1,2] or just 1,2

preg_replace is used to replace by regular expression!
I think you want to use preg_match_all() to get each data attribute from the string.
The regex you want is:
$string = 'users://data01=[1,2]/data02=[2,3]/*';
preg_match_all('#data[0-9]{2}=(\[[0-9,]+\])#',$string,$matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => data01=[1,2]
[1] => data02=[2,3]
)
[1] => Array
(
[0] => [1,2]
[1] => [2,3]
)
)
I have tested this as working.

preg_replace is for replacing stuff. preg_match is for extracting stuff.
So you want:
preg_match('/(.*?)\[(.*?)\](.*?)\[(.*?)\](.*)/', $str, $match);
var_dump($match);
See what you get, and work from there.

Conversion from string to int does not work

I use a regex to get a value from a document and store it in a variable called $distance. That is a string, but I have to put it in an int column of a table in a database.
Of course, normally I would go and say
$distance=intval($distance);
But it doesn't work! I really don't know why.
This is all I am doing:
preg_match_all($regex,$content,$match);
$distance=$match[0][1];
$distance=intval($distance);
The regex is correct, if I echo $distance, it is e.g. "0" - but I need it to be 0 instead of "0". Using intval() will somehow always convert it to an empty string.
EDIT 1
The regex is this:
$regex='#<value>(.+?)</value>#'; // Please, I know I shouldn't use regex for parsing XML - but that is not the problem right now
Then I proceed with
preg_match_all($regex,$content,$match);
$distance=$match[0][1];
$distance=intval($distance);

If you'd do print_r($match) you'd see that the array you need is $match[1]:
$content = '<value>1</value>, <value>12</value>';
$regex='#<value>(.+?)</value>#';
preg_match_all($regex,$content,$match);
print_r($match);
Output:
Array
(
[0] => Array
(
[0] => <value>1</value>
[1] => <value>12</value>
)
[1] => Array
(
[0] => 1
[1] => 12
)
)
In this case:
$distance = (int) $match[1][1];
var_dump($distance);
Output: int(12)
Alternatively, you can use PREG_SET_ORDER flag, i.e. preg_match_all($regex,$content,$match,$flags=PREG_SET_ORDER);, $match array has this structure:
Array
(
[0] => Array
(
[0] => <value>1</value>
[1] => 1
)
[1] => Array
(
[0] => <value>12</value>
[1] => 12
)
)

There must be a space, or possibly (been there, done that) an 0xA0 byte before the zero. Use "\d" in your regexp to be sure to get digits.
Edit: you can clean up the value with
$value = (int)trim($value, " \t\r\n\x0B\xA0\x00");
http://php.net/manual/en/function.trim.php

Why do you need the question mark in your regex? Try this:
$regex='#<value>(.+)</value>#';

split regular expression php

I have a string like that :
0d(Hi)i(Hello)4d(who)i(where)540d(begin)i(began)
And i want to make it an array with that.
I try first to add separator, in order to use the php function explode.
;0,d(Hi),i(Hello);4,d(who),i(where);540,d(begin),i(began)
It works but the problem is I want to minimize the separator to save disk space.
Therefore i want to know by using preg_split, regular expression, if it's possible to have a huge array like that without using separator :
Array ( [0] => Array ( [0] => 0 [1] => d(hi) [2] => i(Hello) )
[1] => Array ( [0] => 4 [1] => d(who) [2] => i(where) )
[2] => Array ( [0] => 540 [1] => d(begin) [2] => i(began) )
)
I try some code & regex, but I saw that the value in the regular expression was not present in the final result (like explode function, in the final array we do not have the delimitor.)
More over, i have some difficulties to build the regex. Here is the one that I made :
$modif = preg_split("/[0-9]+(d(.+))?(i(.+))?/", $data);
I must precise that d() and i() can not be present (but at least one)
Thanks

If you do
preg_match_all('/(\d+)(d\([^()]*\))?(i\([^()]*\))?/', $subject, $result, PREG_SET_ORDER);
on your original string, then you'll get an array where
$result[$i][0]
contains the ith match (i. e. $result[0][0] would be 0d(Hi)i(Hello)) and where
$result[$i][$c]
contains the cth capturing group of the ith match (i. e. $result[0][1] is 0, $result[0][2] is d(Hi) and $result[0][2] is i(Hello)).
Is that what you wanted?

Regular Expression with wordpress shortcodes

I'm trying to find all shortcodes within a string which looks like this:
 [a_col] One
 [/a_col]
outside
[b_col]
Two
[/b_col] [c_col] Three [/c_col]
I need the content (eg "Three") and the letter from the col (a, b or c)
Here's the expression I'm using
preg_match_all('#\[(a|b|c)_col\](.*)\[\/\1_col\]#m', $string, $hits);
but $hits contains only the last one.
The content can have any character even "[" or "]"
EDIT:
I would like to get "outside" as well which can be any string (except these cols). How can I handle that or should I parse this in a second step?

This will capture anything in the content, as well as attributes, and will allow any characters in the content.
<?php
$input = '[a_col some="thing"] One[/a_col]
[b_col] Two [/b_col]
[c_col] [Three] [/c_col] ';
preg_match_all('#\[(a|b|c)_col([^\[]*)\](.*?)\[\/\1_col\]#msi', $input, $matches);
print_r($matches);
?>
EDIT:
You may want to then trim the matches, since it appears there may be some whitespace. Alternatively, you can use regex for removing the whitespace in the content:
preg_match_all('#\[(a|b|c)_col([^\[]*)\]\s*(.*?)\s*\[\/\1_col\]#msi', $input, $matches);
OUTPUT:
Array
(
[0] => Array
(
[0] => [a_col some="thing"] One[/a_col]
[1] => [b_col] Two [/b_col]
[2] => [c_col] [Three] [/c_col]
)
[1] => Array
(
[0] => a
[1] => b
[2] => c
)
[2] => Array
(
[0] => some="thing"
[1] =>
[2] =>
)
[3] => Array
(
[0] => One
[1] => Two
[2] => [Three]
)
)
It might also be helpful to use this for capturing the attribute names and values stored in $matches[2]. Consider $atts to be the first element in $matches[2]. Of course, would iterate over the array of attributes and perform this on each.
preg_match_all('#([^="\'\s]+)[\t ]*=[\t ]*("|\')(.*?)\2#', $atts, $att_matches);
This gives an array where the names are stored in $att_matches[1] and their corresponding values are stored in $att_matches[3].

use ((.|\n)*) instead of (.*) to capture multiple lines...
<?php
$string = "
[a_col] One
[/a_col]
[b_col]
Two
[/b_col] [c_col] Three [/c_col]";
preg_match_all('#\[(a|b|c)_col\]((.|\n)*)\[\/\1_col\]#m', $string, $hits);
echo "<textarea style='width:90%;height:90%;'>";
print_r($hits);
echo "</textarea>";
?>

I don't have an environment I can test with here but you could use a look behind and look ahead assertion and a back reference to match tags around the content. Something like this.
(?<=\[(\w)\]).*(?=\[\/\1\])

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Parse custom characters inside a string - php

Related

PHP/Regex: Parse JSON in string

How to get a particular string using preg_replace?

Conversion from string to int does not work

split regular expression php

Regular Expression with wordpress shortcodes

Categories

Resources