Styling text retrieved using file_get_content

Styling text retrieved using file_get_content - php

I have long text being extracted using file_get_contents(). The text file contains information in the following format:
---
Description:
---
Some description here, with long text sentences.
---
Part 1
---
Information with part 1 in this section followed by path 2.
Now i wish to style the information between ---, for example i would like to make "description" and "Part 1" bold and display the rest in plain text.
I think that can be achieved by preg_match. But i would like to know if any other method can be used too.

The following should work:
preg_replace('/---(.*?)---/s', '<strong>$1</strong>', $text);
The expression captures anything between ---- pairs. $1 in the replacement pattern indicates a backreference - it contains what was matched by the first capturing group. The s modifier makes . also match the newlines.
If you'd like to also remove the whitespace, you could do this:
preg_replace('/---\s*(.*?)\s*---/s', '<strong>$1</strong>', $text);
If there's a possibility that --- pairs occur inside the text, then you can use the following pattern instead:
preg_replace('/---(?=\s)(\s)([^\r\n]+)(\s)---/s','<strong>$2</strong>$3', $text);
Regex101 Demo

You could use also explode
$expl = explode("---",$yourtext);
echo '<b>'.$expl[0].'</b>'; //**Description:**
echo $expl[1]; //Some description here, with long text sentences.
echo '<b>'.$expl[2].'</b>'; //**Part 1**
echo $expl[3]; //Information with part 1 in this section followed by path 2.

You can use a regular expression to do this. The following will work even if you have hyphens in the text you want bolded.
echo preg_replace('/---(\r\n|\n|\r)([^\n\r]+)(\r\n|\n|\r)---/s', '<strong>$2</strong>$3', $text);
For example, suppose your text is:
---
Descrip---tion:
---
Some description here, with long text sentences.
---
Part 1
---
Information with part 1 in this section followed by path 2.
The above code would replace this with:
<strong>Descrip---tion:</strong>
Some description here, with long text sentences.
<strong>Part 1</strong>
Information with part 1 in this section followed by path 2.

Related

How to extract text from multiple lines including the first and last word?

I am trying to extract part of a long text, such as information about caring for a plant. The text contains paragraphs and blank lines. I am not able to capture the specific text I want, the second problem is that the last word isn't showing in the extracted text, and the last problem is when my search starts at the beginning of the line.
I tried searching for the text I want to extract by using a word that isn't at the beginning of the line, it worked except that the end of the desired text is missing a word, and if that word is on new line, it won't show any results at all.
I was using https://scriptun.com/tools/php/preg_match for testing
//The first word to start the search is 'How to'. And I want to capture it as well
// The second word where the text I want ends is '(optional):'
'/(?=How to).*?\s(?=\(optional\):)/'
The sample text I am using to test is:
//Text comes before this..
How to care for Split Leaf Plant
The Split leaf philodendron, also called monstera deliciosa or swiss
cheese plant, is a large, popular, easy- care houseplant that is not
really in the philodendron family. There is a great deal of confusion
about what to call this plant; the various names have become
inter-changeable over the years.
Here is more info (optional):
//And more text goes here
I want to extract all the text from the word 'How to' ending with '(optional):'. Regardless of how many lines or paragraphs are in between
The expected extracted text:
How to care for Split Leaf Plant
The Split leaf philodendron, also called monstera deliciosa or swiss
cheese plant, is a large, popular, easy- care houseplant that is not
really in the philodendron family. There is a great deal of confusion
about what to call this plant; the various names have become
inter-changeable over the years.
Here is more info (optional):
Thank you

That's pretty easy. You can use the following pattern:
https://regex101.com/r/TjE2x8/2
Pattern: ^How to[\w\W]+?\(optional\):$

Pattern: ^How to(?:.|\R)*optional\):$
demo on regex101
Explanation:
^ match the first instance where How to appears at the beginning of the line
(?: ) non capturing group. We need it because of the following OR instruction which is the pipe |. But we don't need to capture the contents. That's why we use ?: after the first parenthesis.
. every character
| or
\R every kind of new line
* make sure to capture zero to every instance of the group
optional\):$ match the word optional with parenthesis (escaped, because it is not an instruction) \) and a colon : at the very end of the text $
Pattern 2: /^How to.*optional\):$/ms
demo on regex101
This pattern is even simpler, but requires the m and s flag to be set in order to match multiline and the . character class to match new lines.

How to get text between custom dynamic html tags without end tags

I have text separated by multiple custom tags with partially dynamic names and without closing tags.
What I need is to get all of the individual parts of the text between the custom tags, not including the tags.
For the last part of text, I can only get text after the tag, because it doesn't have a closing tag.
I've seen plenty of similar questions,but I didn't find them sufficient to solve my problem.
Example:
<*fixedTagName|Dynamic part of tag name> // * and | are included in fixed part of tag name
//dynamic part can have spaces between words
Random text I need to get of unknown length
some paragraphs of text can start like this(look bellow)
» name: value
» name: value
<*fixedTagName|Dynamic part of tag>
More random text I need to get
<*fixedTagName|Dynamic part of tag>
Final part of random text I need to get

To get a text between regular expression matches you can use the preg_split function:
$result = preg_split('/<\*[^|]+\|[^>]+>/', $input);
In this regular expression:
<\* matches <*;
[^|]+ matches any symbol except | 1..* times;
\| matches |;
[^>]+ matches any symbol except > 1..* times;
> matches >.
With this input:
$input = <<<EOL
<*fixedTagName|Dynamic part of tag name> // * and | are included in fixed part of tag name
//dynamic part can have spaces between words
Random text I need to get of unknown length
some paragraphs of text can start like this(look bellow)
» name: value
» name: value
<*fixedTagName|Dynamic part of tag>
More random text I need to get
<*fixedTagName|Dynamic part of tag>
Final part of random text I need to get
EOL;
The $result will be an array of string something like that:
Array
(
[0] =>
[1] => // * and | are included in fixed part of tag name
//dynamic part can have spaces between words
Random text I need to get of unknown length
some paragraphs of text can start like this(look bellow)
» name: value
» name: value
[2] =>
More random text I need to get
[3] =>
Final part of random text I need to get
)

I think this StackOverflow answer fully explain how you can do this: https://stackoverflow.com/a/3577662/7578179

Regex starts with x or x prefixed or suffixed

I'm trying to get pattern match for string like the following to convert every line into a list item <li>:
-Give on result
&Second new text
-The third text
Another paragraph without list.
-New list here
In natural language: Match every string that starts with - and ended with the new line sign \n
I tried the following pattern that works fine:
/^([-|-]\w+\s*.*)?\n*$/gum
Of course we can write it simply without the square brackets ^(-\w+\s*.*)?\n*$ but for debugging I used it as described.
In the example above, when I replaces the second - with & to be ^([-|&]\w+\s*.*)?\n*$ It works fine too and it mtaches the the second line of the smaple string. However, I could not able to make it matches - prefixed with white space or suffixed with white space.
I changed the sample string to:
- Give on result
&Second new text
-The third text
Another paragraph without list.
-New list here
and I tried the following pattern:
/^([-|\- |&| -]\w+\s*.*)?\n*$/gum
However, it failed to match any suffixed or prefixed - with white space.
Here are a live demo for the original working pattern:

To my understanding, what you want is having a line that starts with an element e (e being & or -), with element being either prefixed/suffixed by space(s).
^\s*[&-]\s*(.*)$
If you do not want multilines, simply do not use the m modifier.

^(\h*(?:-|&)\h*\w+\s*.*)\n*$
You can try this.| inside [] has no special meaning.See demo.
https://regex101.com/r/nS2lT4/3
A string may start with whitespace, then it should have either - or & which may have spaces ahead. Then it should have at least one alphanumeric characters which may have space ahead. Then it can have anything or nothing. In the end, it will eat up all the newlines it consume or none if it can't.

Regex optional groups

I'd like to capture up to four groups of text between <p> and </p>. I can do that using the following regex:
<h5>Trivia<\/h5><p>(.*)<\/p><p>(.*)<\/p><p>(.*)<\/p><p>(.*)<\/p>
The text to match on:
<h5>Trivia</h5><p>Was discovered by a freelance photographer while sunbathing on Bournemouth Beach in August 2003.</p><p>Supports Southampton FC.</p><p>She has 11 GCSEs and 2 'A' Levels.</p><p>Listens to soul, R&B, Stevie Wonder, Aretha Franklin, Usher Raymond, Michael Jackson and George Michael.</p>
It outputs the four lines of text. It also works as intended if there are more trivia items or <p> occurrences.
But if there are less than 4 trivia items or <p> groups, it outputs nothing since it cannot find the fourth group. How do I make that group optional?
I've tried: <h5>Trivia<\/h5><p>(.*?)<\/p>(?:<p>(.*?)<\/p>)?(?:<p>(.*?)<\/p>)?(?:<p>(.*?)<\/p>)?(?:<p>(.*?)<\/p>)? and that works according to http://gskinner.com/RegExr/ but it doesn't work if I put it inside PHP code. It only detects one group and puts everything in it.

The magic word is either 'escaping' or 'delimiters', read on.
The first regex:
<h5>Trivia<\/h5><p>(.*)<\/p><p>(.*)<\/p><p>(.*)<\/p><p>(.*)<\/p>
worked because you escaped the / characters in tags like </h5> to <\/h5>.
But in your second regex (correctly enclosing each paragraph in a optional non-capturing group, fetching 1 to 5 paragraphs):
<h5>Trivia</h5><p>(.*?)</p>(?:<p>(.*?)</p>)?(?:<p>(.*?)</p>)?(?:<p>(.*?)</p>)?(?:<p>(.*?)</p>)?
you forgot to escape those / characters.
It should then have been:
$pattern = '/<h5>Trivia<\/h5><p>(.*?)<\/p>(?:<p>(.*?)<\/p>)?(?:<p>(.*?)<\/p>)?(?:<p>(.*?)<\/p>)?(?:<p>(.*?)<\/p>)?/';
The above is assuming you were putting your regex between two / "delimiters" characters (out of conventional habit).
To dive a little deeper into the rabbit-hole, one should note that in php the first and last character of a regular expression is usually a "delimiter", so one can add modifiers at the end (like case-insensitive etc).
So instead of escaping your regex, you could also use a ~ character (or #, etc) as a delimiter.
Thus you could also use the same identical (second) regex that you posted and enclose for example like this:
$pattern = '~<h5>Trivia</h5><p>(.*?)</p>(?:<p>(.*?)</p>)?(?:<p>(.*?)</p>)?(?:<p>(.*?)</p>)?(?:<p>(.*?)</p>)?~';
Here is a working (web-based) example of that, using # as delimiter (just because we can).

You can use the question mark to make each <p>...</p> optional:
$pattern = '~<h5>Trivia</h5>(?:<p>(.*?)</p>)?(?:<p>(.*?)</p>)?(?:<p>(.*?)</p>)?(?:<p>(.*?)</p>)?~';
Use the Dom is a good option too.

Regex should not match if detects one or more space characters

My linking structure for user input:
++visible part of link====invisible HTML address part of link++
Input string:
some text here some text here ++stack overflow====http://stackoverflow.com/questions/ask++ some text here ++examplesite.com====http://www.examplesite.com/article?id=1++ some text here some text here some text here some text here ++shouldnotmatch.com====http://w ww.shouldnotmatch.com/++ some text here.
My aim:
If the part between ==== and ++ includes one or more space character(s), preg_match_all should not match. So my desired output is to match with first two linking attempts. But the last linking attempt should not match since w ww includes one space character.
My unsuccessful attempts:
\+\+(.+?)====(.+?[^ ])\+\+
\+\+(.+?)====(.+?[^ {1, }])\+\+
Can you please correct me?

With your first attempt you were allowing all characters before the space verification.
Does something like this work?
!\+\+(.+?)====([^ ]+?)\+\+!
If there is always something between those parenthesis then you can drop the ?
!\+\+(.+?)====([^ ]+)\+\+!

Try this regular expression :
[+]{2}(.+?)[=]{4}([^\s]+?)[+]{2}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.