Get the first sentence from string using php - php

I have some content stored in a variable and it looks like"
$content = "This is a test content and the content of the url is http://www.test.com. The is a second sentence.";
Now my code is
$pos = strpos($content, '.');
$firstsentence = substr($content, 0, $pos);
The above code doesn't work as the string already contains a url having dots.
How can I get the first sentence considering the fact that a string contains a hyperlink?

Please share other scenarios of text. This works fine for your example:
$sentences = 'This is a test content and the content of the url is http://www.test.com. The is a second sentence.';
preg_match('/(http|https):(.*?)com/', $sentences, $match);
$sentences = preg_replace('/(http|https):(.*?)com/', '', $sentences);
$pos = strpos($sentences, '.');
$pos .= -1;
$firstsentence = substr($sentences, 0, $pos) .$match[0].'.';
//This is a test content and the content of the url is http://www.test.com.

In general, I think you're going to also have to look for <sentence-end-punct>"<whitespace>, "<sentence-end-punct><whitespace>, and <sentence-end-punct><whitespace> (where <whitespace> includes the end of a line). Is this very general English text, not especially under your control, or is the grammar very limited? For non-English text, there can be additional rules, such as putting spaces between punctuation and quotes.
Add: What are you trying to accomplish here? Do you really need to pull apart text into individual sentences, or are you just trying to create a "teaser". In the latter case, just cut off the text at a complete word before some number of characters, and add an ellipsis (...).

Related

Removing Known Characters From End of String

I am writing an HTML file using file_put_content(), but want to be able to add additional content later by pulling the current file contents and chopping off the known ending to the html.
So something along these lines:
$close = '</body></html>';
$htmlFile = file_get_contents('someUrl');
$tmp = $htmlFile - $close;
file_put_contents('someUrl', $tmp.'New Content'.$close);
But since I can't just subtract strings, how can I remove the known string from the end of the file contents?
substr can be used to cut off a know length from the end of a string. But maybe you should determine if your string really ends with your suffix. To reach this, you can also use substr:
if (strtolower(substr($string, -strlen($suffix))) == strtolower($suffix)) {
$string = substr($string, 0, -strlen($suffix));
}
If the case not play any role, you can omit strtolower.
On the other side you can use str_replace to inject your content:
$string = str_replace('</body>', $newContent . '</body>', $string);
Maybe, Manipulate HTML from php could be also helpful.

Break up long words in a UTF-8 text, with PHP

Horrible title, I know.
I want to have some kind of wordwrap, but obviously can not use wordwrap() as it messes up UTF-8.. not to mention markup.
My issue is that I want to get rid of stuff like this "eeeeeeeeeeeeeeeeeeeeeeeeeeee" .. but then longer of course. Some jokesters find it funny to put that stuff on my site.
So when I have a string like this "Hello how areeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee you doing?" I want to break up the 'areeee'-thing with the zero width space (​) character.
Strings aren't always the same letter, and strings are always inside larger strings.. so str_len, substr, wordwrap all don't really fit the description.
Who can help me out?
Said that this is not a PHP solution, if your problem is the view of your script, why don't you use the simple CSS3 rule called word-wrap?
Let your container is a div with id="example", you can write:
#example
{
word-wrap: break-word;
}
Do this in 3 steps
do a split on the string and whitespace
do a str_len/trim on each word in the string
concat the string back together
The downside to this would be that words longer than 10 chars would be broken as well. So I would suggest adding some stuff in here to see if it is the same letter in a row over and over.
EXAMPLE
$string = "Hello how areeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee you doing?";
$strArr = explode(" ",$string);
foreach($strArr as $word) {
if(strlen($word) > 10) {
$word = substr($word,0,10);
}
$wordArr[] = $word;
}
$newString = implode(" ",$wordArr);
print $newString; // Prints "Hello how areeeeeeee you doing?"

PHP: Regexp to change urls

I'm looking for nice regexp which could change me string from:
text text website.tld text text anotherwebsite.tld/longeraddress text http://maybeanotheradress.tld/file.ext
into bbcodes
text text [url=website.tld]LINK[/url] text text [url=anotherwebsite.tld/longeradress]LINK[/url] text text [url=http://maybeanotheradress.tld/file/ext]LINK[/url]
Could you please advice?
Even I vote for duplicate, a general suggestion: Divide and Conquer.
In your input string, all "URLs" do not contain any spaces. So you can divide the string into the parts that do not contain spaces:
$chunks = explode(' ', $str);
As we know that each part is now potentially a link you can create your own function that is able to tell so:
/**
* #return bool
*/
function is_text_link($str)
{
# do whatever you need to do here to tell whether something is
# a link in your domain or not.
# for example, taken the links you have in your question:
$links = array(
'website.tld',
'anotherwebsite.tld/longeraddress',
'http://maybeanotheradress.tld/file.ext'
);
return in_array($str, $links);
}
The in_array is just an example, you might be looking for regular expression based pattern matching instead. You can edit it later to fit your needs, I leave this as an exercise.
As you can now say what a link is and what not, the only problem left is how to create a BBCode out of a link, that's a fairly simple string operation:
if (is_link($chunk))
{
$chunk = sprintf('[url=%s]LINK[/url]', $chunk);
}
So technically, all problems have been solved and this needs to be put together:
function bbcode_links($str)
{
$chunks = explode(' ', $str);
foreach ($chunks as &$chunk)
{
if (is_text_link($chunk))
{
$chunk = sprintf('[url=%s]LINK[/url]', $chunk);
}
}
return implode(' ', $chunks);
}
This already runs with your example string in question (Demo):
$str = 'text text website.tld text text anotherwebsite.tld/longeraddress text http://maybeanotheradress.tld/file.ext';
echo bbcode_links($str);
Output:
text text [url=website.tld]LINK[/url] text text [url=anotherwebsite.tld/longeraddress]LINK[/url] text [url=http://maybeanotheradress.tld/file.ext]LINK[/url]
You then only need to tweak your is_link function to fullfill your needs. Have fun!

Preg Replace - replace second occurance of a match

I am relatively new to php, and hope someone can help me with a replace regex, or maybe a match replace I am not exactly sure.
I want to automatically bold the (second occurance of a match) and then make the 4th appearance of a match italic and then the 7th appearance of a match underlined.
This is basically for SEO purposes in content.
I have done some replacements with: and were thinking this should do the trick?
preg_replace( pattern, replacement, subject [, limit ])
I already know the word I want to use in
'pattern' is also a word that is already defined like [word].
`replacement` 'This is a variable I am getting from a mysql db.
'subject' - The subject is text from a db.
Lets say I have this content: This explains more or less what I want to do.
This is an example of the text that I want to replace. In this text I want to make the second occurance of the word example < bold. Then I want to skip the next time example occurs in the text, and make the 4th time the word example appears in italic. Then I want to skip the 5th time the word example appears in the text, as well as the 6th time and lastly wants to make the 7th time example appears in the text underline it. In this example I have used a hyperlink as the underline example as I do not see an underline function in the text editor. The word example may appear more times in the text, but my only requerement is to underline once, make bold once and make italic once. I may later descide to do some quotes on the word "example" as well but it is not yet priority.
It is also important for the code not to through an error if there is not atleast 7 occurances of the word.
How would I do this, any ideas would be appreciated.
You could use preg_split to split the text at the matches, apply the modifications, and then put everything back together:
$parts = preg_split('/(example)/', $str, 7, PREG_SPLIT_DELIM_CAPTURE);
if (isset($parts[3])) $parts[3] = '<b>'.$parts[3].'</b>';
if (isset($parts[7])) $parts[7] = '<i>'.$parts[7].'</i>';
if (isset($parts[13])) $parts[13] = '<u>'.$parts[13].'</u>';
$str = implode('', $parts);
The index formula for the i-th match is index = i · 2 - 1.
The regular expression itself cannot count, and the preg_ functions provide little help. You need a workaround. If you were to actually search for just a word, you might want to use string functions. Otherwise try:
// just counting
if (7 >= preg_match_all($pattern, $subject, $matches)) {
$cb_num = 0;
$subject = preg_replace_callback($pattern, "cb_ibu", $subject);
}
function cb_ibu($match) {
global $cb_num;
$match = $match[0];
switch (++$cb_num) {
case 2: return "<b>$match</b>";
case 4: return "<i>$match</i>";
case 7: return "<u>$match</u>";
default: return $match;
}
}
The trick is to have a callback which does the accounting. And there it's quite easy to add any rules.
That's an interesting question. My implementation would be:
function replace_exact($word, $tag, $string, $limit) {
$tag1 = '<'.$tag.'>';
$tag2 = '</'.$tag.'>';
$string = str_replace($word, $tag1.$word.$tag2, $string, 1);
if ($limit==1) return $string;
return str_replace($tag1.$word.$tag2,$word,$string,$limit-1);
}
Use it like this:
echo replace_exact('Example', 'b', $source_text, 2);
echo replace_exact('Example', 'i', $source_text, 4);
I don't know about how fast this will work, but it will be faster than preg_replace.

Delete first four lines from the top in content stored in a variable

I have a variable that needs the first four lines stripped out before being displayed:
Error Report Submission
From: First Last, email#example.com, 12345
Date: 2009-04-16 04:33:31 pm Eastern
The content to be output starts here and can go on for any number of lines.
I need to remove the 'header' from this data before I display it as part of a 'pending error reports' view.
Mmm. I am sure someone is going to come up with something nifty/shorter/nicer, but how about:
$str = implode("\n", array_slice(explode("\n", $str), 4));
If that is too unsightly, you can always abstract it away:
function str_chop_lines($str, $lines = 4) {
return implode("\n", array_slice(explode("\n", $str), $lines));
}
$str = str_chop_lines($str);
EDIT: Thinking about it some more, I wouldn't recommend using the str_chop_lines function unless you plan on doing this in many parts of your application. The original one-liner is clear enough, I think, and anyone stumbling upon str_chop_lines may not realize the default is 4 without going to the function definition.
$content = preg_replace("/^(.*\n){4}/", "", $content);
Strpos helps out a lot: Here's an example:
// $myString = "blah blah \n \n \n etc \n \n blah blah";
$len = strpos($myString, "\n\n");
$string = substr($myString, $len, strlen($myString) - $len);
$string then contains the string after finding those two newlines in a row.
Split the string into an array using split(rex), where rex matches two consecutive newlines, and then concatenate the entire array, except for the first element (which is the header).

Categories