PHP preg_match_all: extract specific lines

PHP preg_match_all: extract specific lines - php

I have a problem. I need to get some lines of a page like this:
Text text text ...
Porto-Portugal-May-2013
Barcelona-Spain-April-2013
Text text text text text ...
Madrid-Spain-April-2013
Text text text ...
I need filter so that only appear the following:
Porto-Portugal-May-2013
Barcelona-Spain-April-2013
Madrid-Spain-April-2013
(lines with 3 dashes)
It’s possible with preg_match_all or other function?
I use cURL to get page content.
I have tried:
$body = " Text text text ...
Porto-Portugal-May-2013
Barcelona-Spain-April-2013
Text text text text text ...
Madrid-Spain-April-2013
Text text text ...";
preg_match_all("/^(.*?)-(.*?)-(.*?)-(.*?)\/",$body, $match);
for($i=0;$i<sizeof($match[1]);$i++)
{
echo $match[1][$j].'<br/>';
}
Thank you.

^ means "start of string".
Add the m modifier to make it mean "start of line" instead.
Then it's easier:
preg_match_all("/^(?:[^-\n]+-){3}[^-\n]+$/m",$body,$matches);
var_dump($matches[0]);
This should output an array containing each line that matched.

In the case of defined years in the last of your lines, you don't need regex to complete this task, as follows:
<?php
$yearsList = array(2013, 2014);
$body = " Text text text ...
Porto-Portugal-May-2013
Barcelona-Spain-April-2013
Text text text text text ...
Madrid-Spain-April-2013
Text text text ...";
$arr = explode("\n",$body);
$res = array();
foreach ($arr as $items){
$itemArr = explode('-', $items);
foreach ($itemArr as $item){
if (in_array($item, $yearsList)) $res[] = $items;
}
}
echo "<pre>";
print_r($res);
?>
View This DEMO: http://codepad.org/fdhwEJC4

Related

Conversion of text within delimeters to valid url

I have to convert an old website to a CMS and one of the challenges I have is at present there are over 900 folders that contain up to 9 text files in each folder. I need to combine the up to 9 text files into one and then use that file as the import into the CMS.
The file concatenation and import are working perfectly.
The challenge that I have is parsing some of the text in the text file.
The text file contains a url in the form of
Some text [http://xxxxx.com|About something] some more text
I am converting this with this code
if (substr ($line1, 0, 7) !=="Replace") {
$pattern = '/\\[/';
$pattern2 = '/\\]/';
$pattern3 = '/\\|/';
$replacement = '<a href="';
$replacement3 = '">';
$replacement2='</a><br>';
$subject = $line1;
$i=preg_replace($pattern, $replacement, $subject, -1 );
$i=preg_replace($pattern3, $replacement3, $i, -1 );
$i=preg_replace($pattern2, $replacement2, $i, -1 );
$line .= '<div class="'.$folders[$x].'">'.$i.'</div>' ;
}
It may not be the most efficient code but it works and as this is a one off exercise execution time etc is not an issue.
Now to the problem that I cannot seem to code around. Some of the urls in the text files are in this format
Some text [http://xxxx.com] some more text
The pattern matching that I have above finds pattern and pattern2 but as there is no pattern3 the url is malformed in the output.
Regular expressions are not my forte is there a way to modify what I have above or is there another way to get the correctly formatted url in my output or will I need to parse the output a second time looking for the malformed url and correct it before writing it to the output file?

You can use preg_replace_callback() to achieve this:
Find any string of the format [...]
Try to split them by the delimiter | using explode()
If the split array contains two pieces, then it means the [...] string contains two pieces: the link href and the link anchor text
If not, then it means the the [...] string contains only the link href part
Format and return the link
Code:
$input = <<<EOD
Some text [http://xxxxx.com|About something] some more text
Some text [http://xxxx.com] some more text
EOD;
$output = preg_replace_callback('#\[([^\]]+)\]#', function($m)
{
$parts = explode('|', $m[1]);
if (count($parts) == 2)
{
return sprintf('%s', $parts[0], $parts[1]);
}
else
{
return sprintf('%1$s', $m[1]);
}
}, $input);
echo $output;
Output:
Some text About something some more text
Some text http://xxxx.com some more text
Live demo

Remove Text With Preg_Replace

I have some text like the following example:
Some Text Here
[code]Some link[/code]
Text
[code]Link[/code]
Other Text
[code]Another Link[/code]
Other Text1
I want to remove all the text above, under, and between the two code. Here's an example of the output I want:
[code]Some Link[/code]
[code]Link[/code]
[code]Another Link[/code]
I use preg_replace for removing text above the first Code, in this way:
$message = preg_replace('/(.*?)\[code/si','[code',$message, 1);
Can you help me to remove the other text, using preg_replace?

You can do this way:
preg_match_all('/(\[code\].*\[\/code\])/Usmi', $text, $res);
$cnt = 0;
foreach ($res as $val) {
$cnt++;
$message .= $val[$cnt] . "<br />";
}
echo $message;

Just to make the solution of #Andreev a little more simple :
$text = "
Some Text Here
[code]Some link[/code]
Text
[code]Link[/code]
Other Text
[code]Another Link[/code]
Other Text1
";
$keywords = preg_match_all('/(\[code\].*\[\/code\])/Usmi', $text, $res);
print(implode($res[0]));
You can test it here : http://phptester.net/index.php?lang=en

Assuming you can never have [code] abc [code] def [/code] ghi [/code], try this:
do {
$message = preg_replace("((?:\[code\].*?\[/code\])*).*?(?=\[code\]))is","$1",$message,-1,$c);
} while($c);

For every line beginning with 4 spaces, add text-indent tags

I've got text where some lines are indented with 4 spaces. I've been trying to write a regex which would find every line beginning with 4 spaces and put a <span class="indented"> at the beginning and a </span> at the end. I'm no good at regex yet, though, so it came to nothing. Is there a way to do it?
(I'm working in PHP, in case there's an option easier than regex).
Example:
Text text text
Indented text text text
More text text text
A bit more text text.
to:
Text text text
<span class="indented">Indented text text text</span>
More text text text
A bit more text text.

The following will match lines starting with at least 4 spaces or a tab character:
$str = preg_replace("/^(?: {4,}|\t *)(.*)$/m", "<span class=\"indented\">$1</span>", $str);

I had to do something similar, and one thing I might suggest is changing the goal formatting to be
<span class="tab"></span>Indented text text text
You can then set your css something like .tab {width:4em;} and instead of using preg_replace and regexes, you can do
str_replace($str, " ", "<span class='tab'></span>");
This has the benefit of allowing for 8 spaces to turn into a double width tab easily.

I think this should work:
//get each line as an item in an array
$array_of_lines = explode("\n", $your_string_of_lines);
foreach($array_of_lines as $line) {
// First four characters
$first_four = substr($line, 0, 4);
if($first_four == ' ') {
$line = trim($line);
$line = '<span class="indented">'.$line.'</span>';
}
$output[] = $line;
}
echo implode("\n",$output);

Using PHP to remove a html element from a string

I am having trouble working out how to do this, I have a string looks something like this...
$text = "<p>This is some example text This is some example text This is some example text</p>
<p><em>This is some example text This is some example text This is some example text</em></p>
<p>This is some example text This is some example text This is some example text</p>";
I basically want to use something like preg_repalce and regex to remove
<em>This is some example text This is some example text This is some example text</em>
So I need to write some PHP code that will search for the opening <em> and closing </em> and delete all text in-between
hope someone can help,
Thanks.

$text = preg_replace('/([\s\S]*)(<em>)([\s\S]*)(</em>)([\s\S]*)/', '$1$5', $text);

In case if you are interested in a non-regex solution following would aswell:
<?php
$text = "<p>This is some example text This is some example text This is some example text</p>
<p><em>This is some example text This is some example text This is some example text</em></p>
<p>This is some example text This is some example text This is some example text</p>";
$emStartPos = strpos($text,"<em>");
$emEndPos = strpos($text,"</em>");
if ($emStartPos && $emEndPos) {
$emEndPos += 5; //remove <em> tag aswell
$len = $emEndPos - $emStartPos;
$text = substr_replace($text, '', $emStartPos, $len);
}
?>
This will remove all the content in between tags.

$text = '<p>This is some example text This is some example text This is some example text</p>
<p><em>This is the em text</em></p>
<p>This is some example text This is some example text This is some example text</p>';
preg_match("#<em>(.+?)</em>#", $text, $output);
echo $output[0]; // This will output it with em style
echo '<br /><br />';
echo $output[1]; // This will output only the text between the em
[ View output ]
For this example to work, I changed the <em></em> contents a little, otherwise all your text is the same and you cannot really understand if the script works.
However, if you want to get rid of the <em> and not to get the contents:
$text = '<p>This is some example text This is some example text This is some example text</p>
<p><em>This is the em text</em></p>
<p>This is some example text This is some example text This is some example text</p>';
echo preg_replace("/<em>(.+)<\/em>/", "", $text);
[ View output ]

Use strrpos to find the first element and
then the last element.
Use substr to get the part of string.
And then replace the substring with empty string from original string.

format: $text = str_replace('<em>','',$text);
$text = str_replace('</em>','',$text);

PHP: insert text up to delimiter

I have a bunch of chat logs that look like this:
name: some text
name2: more text
name: text
name3: text
I want to highlight the just the names. I wrote some code that should do it, however, I was wondering if there was a much cleaner way than this:
$line= "name: text";
$newtext = explode(":", $line,1);
$newertext = "<font color=red>".$newtext[0]."</font>:";
$complete = $newertext.$newtext[1];
echo $complete;

Looks fine, although you can save the temp variables:
$newtext = explode(":", $line,1);
echo "<font color=red>$newtext[0]</font>:$newtext[1]";
This might be faster or might not, you'd have to test:
echo '<font color=red>' . substr_replace($line, '</font>', strpos($line, ':') , 0);

The answer posted by gview is the simplest it gets, however and just as a reference you can use a regular expression to find the name tag, and replace it with the new html code using preg_replace() as follows:
// Regular expression pattern
$pattern = '/^[a-z0-9]+:?/';
// Array contaning the lines
$str = array('name: some text : Other text and stuff',
'name2: more text : : TEsting',
'name: text testing',
'name3: text Lorem ipsum');
// Looping through the array
foreach($str as $line)
{
// \\0 references the first pattern match which is "name:"
echo preg_replace($pattern, "<font color=red>\\0</font>:", $line);
}

also try the RegExp like this:
$line = "name: text";
$complete = preg_replace('/^(name.*?):/', "<font color=red>$1</font>:", $line);
echo $complete ;
EDIT
if their names aren't "name" or "name1", just delete the name in pattern, like this
$complete = preg_replace('/^(.*?):/', "<font color=red>$1</font>:", $line);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP preg_match_all: extract specific lines - php

^ means "start of string". Add the m modifier to make it mean "start of line" instead. Then it's easier: preg_match_all("/^(?:[^-\n]+-){3}[^-\n]+$/m",$body,$matches); var_dump($matches[0]); This should output an array containing each line that matched.

Related

Conversion of text within delimeters to valid url

Remove Text With Preg_Replace

For every line beginning with 4 spaces, add text-indent tags

Using PHP to remove a html element from a string

PHP: insert text up to delimiter

Categories

Resources