PHP - Improve Regex (space and non-capturing group) - php

I have this kind of string :
$string = "<strong>Blabla1</strong> Blaabla2<br /> Blaabla3 <strong>Blaabla4</strong> Blaabla5 Blaabla6<br /><br /> Blaabla7 <span style='color:#B22222;'>Blaabla8</span> Blaabla9";
I'm trying to explode each word where there is a " " or "<br />" with preg_split .
My conditions :
For each word (Blablax), I need to keep his tags like <strong>, <span>, <em>... but split him after a <br /> or more <br />
I tried this, thanks to another post on stackoverflow :
preg_split('/<br(\s\/)?>\K|\s/',$string,null,PREG_SPLIT_NO_EMPTY);
OUTPUT:
array (size=12)
0 => string '<strong>Blabla1</strong>' (length=24)
1 => string 'Blaabla2<br />' (length=14)
2 => string 'Blaabla3' (length=8)
3 => string '<strong>Blaabla4</strong>' (length=25)
4 => string 'Blaabla5' (length=8)
5 => string 'Blaabla6<br />' (length=14)
6 => string '<br' (length=3)
7 => string '/>' (length=2)
8 => string 'Blaabla7' (length=8)
9 => string '<span' (length=5)
10 => string 'style='color:#B22222;'>Blaabla8</span>' (length=38)
11 => string 'Blaabla9' (length=8)
Everything works except for index 6 and index 7 (see above in OUTPUT) and index 9 and index 10
What I'll exepect :
array (size=12)
0 => string '<strong>Blabla1</strong>' (length=24)
1 => string 'Blaabla2<br />' (length=14)
2 => string 'Blaabla3' (length=8)
3 => string '<strong>Blaabla4</strong>' (length=25)
4 => string 'Blaabla5' (length=8)
5 => string 'Blaabla6<br /><br />' (length=14)
6 => string 'Blaabla7' (length=8)
7 => string '<span style='color:#B22222;'>Blaabla8</span>' (length=45)
8 => string 'Blaabla9' (length=8)
See index 5 and index 7
My regex works if I have just one <br /> but if more than one, there is a mistakes... idem if I have a <span style...>
Thanks !

$string = "<strong>Blabla1</strong> Blaabla2<br /> Blaabla3 <strong>Blaabla4</strong> Blaabla5 Blaabla6<br /><br /> Blaabla7 <span style='color:#B22222;'>Blaabla8</span> Blaabla9";
$matches = preg_split('/(<br.*?>|<span.*>)+\K|\s/sim', $string, null, PREG_SPLIT_NO_EMPTY );
var_dump($matches);
/*
array(9) {
[0]=>
string(24) "<strong>Blabla1</strong>"
[1]=>
string(14) "Blaabla2<br />"
[2]=>
string(8) "Blaabla3"
[3]=>
string(25) "<strong>Blaabla4</strong>"
[4]=>
string(8) "Blaabla5"
[5]=>
string(20) "Blaabla6<br /><br />"
[6]=>
string(8) "Blaabla7"
[7]=>
string(44) "<span style='color:#B22222;'>Blaabla8</span>"
[8]=>
string(8) "Blaabla9"
}
*/
DEMO

Looking at your expected array at index 5 and index 7, you probably want this regex:
preg_split('~(?:</?[a-zA-Z0-9][^>]*+>|\S)++\K|\s~',$string,null,PREG_SPLIT_NO_EMPTY);
Demo on ideone
Output:
array(9) {
[0]=>
string(24) "<strong>Blabla1</strong>"
[1]=>
string(14) "Blaabla2<br />"
[2]=>
string(8) "Blaabla3"
[3]=>
string(25) "<strong>Blaabla4</strong>"
[4]=>
string(8) "Blaabla5"
[5]=>
string(20) "Blaabla6<br /><br />"
[6]=>
string(8) "Blaabla7"
[7]=>
string(44) "<span style='color:#B22222;'>Blaabla8</span>"
[8]=>
string(8) "Blaabla9"
}
The regex attempts to match a full tag, and if a full tag can't be consumed, it will consume one non-space character, then rinse and repeat. This will prevent tags from being split, which gives expected output for index 5 and 7.
I wouldn't recommend doing this with regex, though. I didn't consult the HTML specs when writing the regex, so the regex is very brittle and may break on input in the wild. You might want to learn how to parse HTML properly with one of the libraries listed in this question: How do you parse and process HTML/XML in PHP?

Here is the regex
((?:<br\s*\/?>)+)|(?<!<br)\s+(?!\/?>)
Use this with preg_replace using $1\n as a replacement string, and then you can split by newline to get the array (removing empty ones).
See demo.

Related

How do I make var_dump apear in separate lines?

I don't want to know about other options then var_dump, because this is for a homework assignment and my teacher wants to me to make a var_dump and then let the different object appear in separate lines, I searched a shitload of sites and I simply cant find anything pls help.
This i the code in "verzenden.php"
echo '<pre>' . var_dump($_GET) . '</pre>' . '<br>';
pre was standing in <> and "" but it wont show up in here
I tried this but it is still the same as var_dump
<form method='get' action='verzend.php'>
<label>Naam: </label><input name='naam' type='text' value=''>
<label>Klas: </label><input name='klas' type='text' value=''>
<label>Nummer: </label><input name='leerlingnummer' type='text' value=''>
<label>Vak: </label><select name='vak'>
<option value='PHP'>PHP</option>
<option value='javascript'>Javascript</option>
<option value='ASP'>ASP</option>
<option value='SQL'>SQL</option>
</select>
<label>Cijfer: </label><input name='cijfer' type='number' value='5'>
<input type='submit' value='verzend' name='verzend'>
</form>
this is what it needs to become
array(6) { ["naam"]=> string(9) "Abu Saebu"
["Klas"]=> string(5) "IO1A4"
["leerlingnummer"]=> string(8) "36353535"
["vak"]=> string(3) "PHP"
["cijfer"]=> string(1) "9"
["verzend"]=> string(7) "verzend"
}
This is what I get
array(6) { ["naam"]=> string(6) "Sjoerd" ["klas"]=> string(5) "IO1D4" ["leerlingnummer"]=> string(6) "332309" ["vak"]=> string(10) "javascript" ["cijfer"]=> string(2) "24" ["verzend"]=> string(7) "verzend" }
Do it in two line instead of a concatenate.
echo '<pre>';
var_dump($_GET);
This will give you the following:
array(6) {
["naam"]=>
string(9) "Abu Saebu"
["Klas"]=>
string(5) "I01A4"
["leerlingnummer"]=>
string(8) "36353535"
["vak"]=>
string(3) "PHP"
["cijfer"]=>
string(1) "9"
["verzend"]=>
string(7) "verzend"
}
If you don't want it to break after the =>, you could use the print_r instead:
Array
(
[naam] => Abu Saebu
[Klas] => I01A4
[leerlingnummer] => 36353535
[vak] => PHP
[cijfer] => 9
[verzend] => verzend
)
However, if you really want to use var_dump, there's a cool extension out there named xdebug that will dump the details in one line like this without the pre tag:
array (size=6)
'naam' => string 'Abu Saebu' (length=9)
'Klas' => string 'I01A4' (length=5)
'leerlingnummer' => string '36353535' (length=8)
'vak' => string 'PHP' (length=3)
'cijfer' => string '9' (length=1)
'verzend' => string 'verzend' (length=7)
More information about the pre tag: http://www.w3schools.com/tags/tag_pre.asp

how to use markdown parsing technique in php to make separate automated process

I want to develop website with slight automated process or header, menu, navigation bar, footer etc, which uses markdown technique.
for example a navigationbar.md will contain only link text and link address, i want get those details individually (link and text without parsed html format) into variables or parameters in php.
* [Dog][0]
* [German Shepherd][1]
* [Belgian Shepherd][2]
* [Malinois][3]
* [Groenendael][4]
* [Tervuren][5]
* [Cat][6]
* [Siberian][7]
* [Siamese][8]
[0]:(http://google.com)
[1]:(http://google.com)
[2]:(http://google.com)
[3]:(http://google.com)
[4]:(http://google.com)
[5]:(http://google.com)
[6]:(http://google.com)
[7]:(http://google.com)
[8]:(http://google.com)
if don't need any html here id want want nested array contain link text and link address
this structure of markdown will create html output as follow
but i need those list as nested array to perform defined tasks.
let me know if this works.. is their any chance for it
expected output
array (size=9)
0 =>
array (size=2)
0 => string 'Dog' (length=3)
1 => string 'http://google.com' (length=17)
1 =>
array (size=2)
0 => string 'German Shepherd' (length=15)
1 => string 'http://yahoo.com' (length=16)
2 =>
array (size=2)
0 => string 'Belgian Shepherd' (length=16)
1 => string 'http://duckduckgo.com' (length=21)
2 =>
array (size=2)
0 => string 'Malinois' (length=8)
1 => string 'http://amazon.com' (length=17)
2 =>
array (size=2)
0 => string 'Groenendael' (length=11)
1 => string 'http://metallica.com' (length=20)
3 =>
array (size=2)
0 => string 'Tervuren' (length=8)
1 => string 'http://microsoft.com' (length=20)
3 =>
array (size=2)
0 => string 'Cat' (length=3)
1 => string 'http://ibm.com' (length=14)
2 =>
array (size=2)
0 => string 'Siberian' (length=8)
1 => string 'http://apple.com' (length=16)
3 =>
array (size=2)
0 => string 'Siamese' (length=7)
1 => string 'http://stackoverflow.com' (length=24)
This should work. I have provided all the explanation in the comments in the code. This works -
/**
This function takes a strings- $text and $links_text.
For each text value that matches the regular expression, the link
from the $links_text is extracted and given as output.
This returns an array consisting of the text mapped to their links.
It will return a single array if there only single text value, and
a nested array if more than one text is found.
Eg:
INPUT:
var_dump(text_link_map("* [Dog][0]", "[0]:(http://google.com)[1]:(http://yahoo.com)"));
OUTPUT:
array
0 => string 'Dog' (length=3)
1 => string 'http://google.com' (length=17)
*/
function text_link_map($text, $links_text){
$regex= "/\*\s+\[([a-zA-Z0-9\-\_ ]+)\]\[([0-9]+)\]/";
if(preg_match_all($regex, $text, $matches)){
$link_arr = Array();
/*
For each of those indices, find the appropriate link.
*/
foreach($matches[2] as $link_index){
$links = Array();
$link_regex = "/\[".$link_index."\]\:\((.*?)\)/";
if(preg_match($link_regex,$links_text,$links)){
$link_arr[] = $links[1];
}
}
if(count($matches[1]) == 1){
return Array($matches[1][0], $link_arr[0]);
}else{
$text_link = array_map(null, $matches[1], $link_arr);
return $text_link;
}
}else{
return null;
}
}
/**
Function that calls recursive index, and returns it's output.
This is is needed to pass initial values to recursive_index.
*/
function indent_text($text_lines, $links){
$i = 0;
return recursive_index($i, 0, $text_lines, $links);
}
/**
This function creates a nested array out of the $text.
Each indent is assumed to be a single Tab.It is dictated by the
$indent_symbol variable.
This function recursively calls itself when it needs to go from
one level to another.
*/
function recursive_index(&$index, $curr_level, $text, $links){
$indent_symbol = "\t";
$result = Array();
while($index < count($text)){
$line = $text[$index];
$level = strspn($line, $indent_symbol);
if($level == $curr_level){
$result[] = text_link_map($line, $links);
}elseif($level > $curr_level){
$result[count($result) - 1][] = recursive_index($index, $curr_level + 1, $text, $links);
if($index > count($text)){
break;
}else{
$index--;
}
}elseif($level < $curr_level){
break;
}
$index += 1;
}
return $result;
}
$file_name = "navigationbar.md";
$f_contents = file_get_contents($file_name);
//Separate out the text and links part.
//(Assuming the text and the links will always be separated with 2 \r\n)
list($text, $links) = explode("\r\n\r\n", $f_contents);
//Get the nested array.
$formatted_arr = indent_text(explode("\r\n", $text), $links);
var_dump($formatted_arr);
This is the output of the code. It matches your requirements -
/*
OUTPUT
*/
array(4) {
[0]=>
array(2) {
[0]=>
string(3) "Dog"
[1]=>
string(17) "http://google.com"
}
[1]=>
array(2) {
[0]=>
string(15) "German Shepherd"
[1]=>
string(16) "http://yahoo.com"
}
[2]=>
array(3) {
[0]=>
string(16) "Belgian Shepherd"
[1]=>
string(21) "http://duckduckgo.com"
[2]=>
array(3) {
[0]=>
array(2) {
[0]=>
string(8) "Malinois"
[1]=>
string(17) "http://amazon.com"
}
[1]=>
array(2) {
[0]=>
string(11) "Groenendael"
[1]=>
string(20) "http://metallica.com"
}
[2]=>
array(2) {
[0]=>
string(8) "Tervuren"
[1]=>
string(20) "http://microsoft.com"
}
}
}
[3]=>
array(3) {
[0]=>
string(3) "Cat"
[1]=>
string(14) "http://ibm.com"
[2]=>
array(2) {
[0]=>
array(2) {
[0]=>
string(8) "Siberian"
[1]=>
string(16) "http://apple.com"
}
[1]=>
array(2) {
[0]=>
string(7) "Siamese"
[1]=>
string(24) "http://stackoverflow.com"
}
}
}
}
To check, the contents of navigationbar.md is -
* [Dog][0]
* [German Shepherd][1]
* [Belgian Shepherd][2]
* [Malinois][3]
* [Groenendael][4]
* [Tervuren][5]
* [Cat][6]
* [Siberian][7]
* [Siamese][8]
[0]:(http://google.com)
[1]:(http://yahoo.com)
[2]:(http://duckduckgo.com)
[3]:(http://amazon.com)
[4]:(http://metallica.com)
[5]:(http://microsoft.com)
[6]:(http://ibm.com)
[7]:(http://apple.com)
[8]:(http://stackoverflow.com)
Certain assumptions in the code -
The part separating the text(i.e the "* [Dog][0]" part) and the link part(i.e the "[0]:(http://google.com)") are assumed to always be
separated by 2 newlines.
Each parent child differ from a single Tab("\t").
You can test by changing the tabs between text in navigationbar.md.
Hope it helps.

PHP String Extraction [duplicate]

This question already has an answer here:
PHP preg_match to find multiple occurrences
(1 answer)
Closed 8 years ago.
I'm weak with regex, need help. My problem is I have to extract all the string that matches the given pattern I have into an array. See the problem below:
The string
<?php
$alert_types = array(
'warning' => array('', __l("Warning!") ),
'error' => array('alert-error', __l("Error!") ),
'success' => array('alert-success', __l("Success!") ),
'info' => array('alert-info', __l("For your information.") ),
);?>
The Preg_Match Code
preg_match("/.*[_][_][l][\(]['\"](.*)['\"][\)].*/", $content, $matches);
I'm only getting the first one match which is Warning!. I'm Expecting matches will have the following values:
Warning!, Error!, Success!, For your information.
Actually I'm using file_get_contents($file) to get the string.
Can anyone help me to solve this. Thankyou in advance.
preg_match() only finds the first match in the string. Use preg_match_all() to get all matches.
preg_match_all("/.*__l\(['\"](.*?)['\"]\).*/", $content, $matches);
$matches[1] will contain an array of the strings you're looking for.
BTW, you don't need all those single-character brackets. Just put the character into the regexp.
var_dump($matches);
array(2) {
[0]=>
array(4) {
[0]=>
string(45) " 'warning' => array('', __l("Warning!") ),"
[1]=>
string(52) " 'error' => array('alert-error', __l("Error!") ),"
[2]=>
string(58) " 'success' => array('alert-success', __l("Success!") ),"
[3]=>
string(65) " 'info' => array('alert-info', __l("For your information.") ),"
}
[1]=>
array(4) {
[0]=>
string(8) "Warning!"
[1]=>
string(6) "Error!"
[2]=>
string(8) "Success!"
[3]=>
string(21) "For your information."
}
}

Extracting leading spaces from the rest of the string

What is the quickest approach to splitting a string into its leading spaces and the rest of it?
····sth should become array("····", "sth") and ·sth· - array("·", "sth·")
* · = space
$result = preg_split('/^(\s*)/', ' test ', -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
outputs
array(2) {
[0]=>
string(2) " "
[1]=>
string(5) "test "
}
Here's a simpler approach:
$result = preg_split('/\b/', ' sth', 2);
It would output:
array (size=2)
0 => string ' ' (length=3)
1 => string 'sth' (length=3)

Multiple explode characters with comma and - (hyphen)

I want to explode a string for all:
whitespaces (\n \t etc)
comma
hyphen (small dash). Like this >> -
But this does not work:
$keywords = explode("\n\t\r\a,-", "my string");
How to do that?
Explode can't do that. There is a nice function called preg_split for that. Do it like this:
$keywords = preg_split("/[\s,-]+/", "This-sign, is why we can't have nice things");
var_dump($keywords);
This outputs:
array
0 => string 'This' (length=4)
1 => string 'sign' (length=4)
2 => string 'is' (length=2)
3 => string 'why' (length=3)
4 => string 'we' (length=2)
5 => string 'can't' (length=5)
6 => string 'have' (length=4)
7 => string 'nice' (length=4)
8 => string 'things' (length=6)
BTW, do not use split, it is deprecated.
... or if you don't like regexes and you still want to explode stuff, you could replace multiple characters with just one character before your explosion:
$keywords = explode("-", str_replace(array("\n", "\t", "\r", "\a", ",", "-"), "-",
"my string\nIt contains text.\rAnd several\ntypes of new-lines.\tAnd tabs."));
var_dump($keywords);
This blows into:
array(6) {
[0]=>
string(9) "my string"
[1]=>
string(17) "It contains text."
[2]=>
string(11) "And several"
[3]=>
string(12) "types of new"
[4]=>
string(6) "lines."
[5]=>
string(9) "And tabs."
}

Categories