PHP display sentences up to 100 characters - php

My PHP script calls the Freebase API and outputs a paragraph which I then do a little bit of regex and other parsing magic on and return the data to the variable $paragraph. The paragraph is made up of multiple sentences. What I want to do is return a shorter version of the paragraph instead.
I want to display the first sentence. If it is less than 100 characters then I'd like to display the next sentence until it is at least 100 characters.
How can I do this?

You don't need a regular expression for this. You can use strpos() with an offset of 99 to find the first period at or after position 100 - and substr() to grab up to that length.
$shortPara = substr($paragraph, 0, strpos($paragraph, '.', 99) + 1);
You probably want to add a bit of extra checking in case the original paragraph is less than 100 characters, or doesn't end with a period:
// find first period at character 100 or greater
$breakAt = strpos($paragraph, '.', 99);
if ($breakAt === false) {
// no period at or after character 100 - use the whole paragraph
$shortPara = $paragraph;
} else {
// take up to and including the period that we found
$shortPara = substr($paragraph, 0, $breakAt + 1);
}

Related

Preg_matching email and looking for specific text

I have a bunch of emails that I read as text in my program and they all have phone numbers such as these:
+370 655 54298
+37065782505
37069788505
865782825
65782825
(686) 51852
How would I go about finding them and saving it into a variable?
For now I am doing it like this:
$found = preg_match('^[0-9\-\+]{9,15}^', $text, $num);
But it does not working at all
Have a look at the "libphonenumber" Google Library.
There are two functions you may find useful
isPossibleNumber - quickly guessing whether a number is a possible phonenumber by using only the length information, much faster than a full validation.
isValidNumber - full validation of a phone number for a region using length and prefix information.
This should work https://regex101.com/r/E2PzRN/2
#\+?\(?\d+\)?\s?\d+\s?\d+#
<?php
$regex = '#\+?\(?\d+\)?\s?\d+\s?\d+#';
$x = [
'+370 655 54298',
'+37065782505',
'37069788505',
'865782825',
'hjtgfjtdfjtgdfjt',
'65782825',
'(686) 51852',
];
foreach ($x as $y) {
if (preg_match($regex, $y, $match)) {
echo $match[0] . "\n";
}
}
Check it in action here https://3v4l.org/6AlQa
We distinguish here 3 types of phone numbers.
The first type is this one:
+37065782505
37069788505
865782825
65782825
Here, the beginning + is optional. we thus consider that we have 7 digits minimum for these numbers.
The regular expression obtained is therefore
(\+?[0-9]{7,})
The second type is this one:
+370 655 54298
Here we have a first block consisting of a + followed by 2 to 6 digits and then several other blocks of 2 to 6 digits and separated by spaces.
The regular expression obtained is therefore
(\+[0-9]{2,6}(\s[0-9]{2,6})+)
The last type is this one:
(686) 51852
This is a first block consisting of 2 to 6 digits surrounded by parentheses and then several other blocks of 2 to 6 digits and separated by spaces.
The regular expression obtained is therefore
(\([0-9]{2,6}\)(\s[0-9]{2,6})+)
The complete extraction code is therefore
preg_match_all("#(\+?[0-9]{7,})|(\+[0-9]{2,6}(\s[0-9]{2,6})+)|(\([0-9]{2,6}\)(\s[0-9]{2,6})+)#",$text,$out);
$found = $out[0];
where $found is an array.
I would suggest stripping out '+','(',')',' ' and testing if it is a ctype_digit
remove all characters and test if numeric, this assumes that the result is a phone no, if you were to run this on an email address the result would be false
var_dump(ctype_digit(str_replace([' ', '+', '(', ')'], '', '(686) 51852')));
TRUE
var_dump(ctype_digit(str_replace([' ', '+', '(', ')'], '', 'r#pm.mr')));
FALSE

PHP detect variable length string contains any character other than 1

Using PHP I sometimes have strings that look like the following:
111
110
011
1111
0110012
What is the most efficient way (preferably without regex) to determine if a string contains any character other then the character 1?
Here's a one-line code solution that can be put into a conditional etc.:
strlen(str_replace('1','',$mystring))==0
It strips out the "1"s and sees if there's anything left.
User Don't Panic commented that str_replace could be replaced by trim:
strlen(trim($mystring, '1'))==0
which removes leading and trailing 1s and sees if there's anything left. This would work for the particular case in OP's request but the first option will also tell you how many non-"1" characters you have (if that information matters). Depending on implementation, trim might run slightly faster because PHP doesn't have to check any characters between the first and last non-"1" characters.
You could also use a string like a character array and iterate through from the beginning until you find a character which is not =='1' (in which case, return true) or reach the end of the array (in which case, return false).
Finally, though OP here said "preferably without regex," others open to regexes might use one:
preg_match("/[^1]/", $mystring)==1
Another way to do it:
if (base_convert($string, 2, 2) === $string) {
// $string has only 0 and 1 characters.
}
since your $string is basically a binary number, you can check it with base_convert.
How it works:
var_dump(base_convert('110', 2, 2)); // 110
var_dump(base_convert('11503', 2, 2)); // 110
var_dump(base_convert('9111111111111111111110009', 2, 2)); // 11111111111111111111000
If the returned value of base_convert is different from the input, there're something other characters, beside 0 and 1.
If you want checks if the string has only 1 characters:
if(array_sum(str_split($string)) === strlen($string)) {
// $string has only 1 characters.
}
You retrieve all the single numbers with str_split, and sum them with array_sum. If the result isn't the same as the length of the string, then you've other number in the string beside 1.
Another option is treat string like array of symbols and check for something that is not 1. If it is - break for loop:
for ($i = 0; $i < strlen($mystring); $i++) {
if ($mystring[$i] != '1') {
echo 'FOUND!';
break;
}
}

PHP preg_match for dimension

I was searching for hours to get a regular expression for matching dimension in a string.
Consider following types of strings,
120x200
100' X 130'
4 acres
0.54
0.488 (90x223)/ GIS0.78
90x160
100x149.7
143.76 X 453.52
6.13 per tax bill
120x378 per tax roll
I want the O/P contain only dimensions, even with 'X' or 'x'
From the above string, the expected output is,
120x200,100' X 130',0,0,90x223,90x160,100x149.7,143.76 X 453.52,0,120x378
Is there any possible reg-ex? or any other alternative?
Thanks
This seems to work:
<?php
$str = <<<EOD
120x200
100' X 130'
4 acres
0.54
0.488 (90x223)/ GIS0.78
90x160
100x149.7
143.76 X 453.52
6.13 per tax bill
120x378 per tax roll
EOD;
$lines = explode("\n", $str);
foreach($lines as $line) {
if (preg_match('/-?(\d+(?:\.\d+)?(\'|ft|yd|m|")?)\s*x\s*-?(\d+(?:\.\d+)?(?:\\2)?)/i', $line, $match)) {
echo "{$match[1]}x{$match[3]}\n";
} else {
echo "0\n";
}
}
You can add more units of measurement into the 3rd parenthesized expression if you want to match more things, but this matches whole numbers, real numbers, and optional units of measurement after.
This should get you started:
\d+(\.\d+)?['"]?\s*[xX]\s*\d+(\.\d+)?['"]?
demo : http://regexr.com?386sf
However,it does not cover all of your cases,as they are too customized to be handled by regex.
I would recommend using a customized method to parse the input with different cases.

PHP conditional loop - string length

I am sorry if this is a very stupid question, or an obvious newbie mistake - but I as basic as this is , I hardly never used the do - while loop before (I know - I can not comprehend it myself ! How is it possible that I managed to avoid it all those years ??)
so :
I want to select a number of words from the begining of a text paragraph.
I used the following code :
$no_of_char = 70;
$string = $content;
$string = strip_tags(stripslashes($string)); // convert to plaintext
$string = substr($string, 0, strpos(wordwrap($string, $no_of_char), "\n"));
Which Kind of works, but the problem is that sometimes it gives EMPTY results.
I would think that is because the paragraph contains spaces, empty lines , and / or carriage returns...
So I am trying to make a loop condition that will continue to try until the length of the string is at least X characters ..
$no_of_char = 70; // approximation - how many characters we want
$string = $content;
do {
$string = strip_tags(stripslashes($string)); // plaintext
$string = substr($string, 0, strpos(wordwrap($string, $no_of_char), "\n")); // do not crop words
}
while (strlen($string) > 8); // this would be X - and I am guessing here is my problem
Well - obviously it does not work (otherwise this question would not be ) - and now it ALWAYS produces nothing .(empty string)
Try using str_word_count:
$words = str_word_count($string, 2);
2 - returns an associative array, where the key is the numeric
position of the word inside the string and the value is the actual
word itself
Then use array_slice:
$total_words = 70;
$selected_words = array_slice($words, 0, $total_words);
The most likely problem you have is that the string has blank lines at the start. You can easily get rid of them with ltrim(). Then use your original code to get the first actual newline.
The reason your loop didn't work is because you told it to reject anything that was longer than 8 characters.

Splitting text in PHP

I want to know is there any way to split text like this:
123456789 into 123-456-789
as to add "-" after every 3 characters?
Just wanted to know, as I know the reverse, but how to do this is over my head. ;)
and also if the text is
ABCDEFGHI OR A1B2C3D4E or any other format
without any space between the characters !
language: PHP only
<?php
$i = '123456789';
echo 'result: ', wordwrap($i, 3, '-', true);printsresult: 123-456-789
see http://php.net/wordwrap
I'm not a big fan of regexes for simple string extraction (especially fixed length extractions), preferring them for slightly more complex stuff. Almost every language has a substring function so, presuming your input has already been validated, a simple (pseudo-code since you haven't specified a language):
s = substring (s,0,3) + "-" + substring (s,3,3) + "-" + substring (s,6,3)
If you want it every three characters for a variable length string (with odd size at the end):
t = ""
sep = ""
while s != "":
if s.len <= 3:
t = t + sep + s
s = ""
else:
t = t + sep + substring (s,0,3)
s = substring (s,3)
sep = "-"
s = t
For any language:
Create an empty string variable called "result"
Create an integer counter variable, "i", which increments until the length of the original string (the one with the number)
Append each character from the original string to "result"
If i modulo 3 (usually % or mod) is zero, append a dash to "result"
In the interest of completeness, here is a Python solution:
>>> a = "123456789"
>>> a[0:3] + "-" + a[3:6] + "-" + a[6:9]
'123-456-789'
Since you updated your question to specify a PHP solution, this should work:
substr($a, 0, 3) . "-" . substr($a, 3, 3) . "-" . substr($a, 6, 3)
See substr for more information on this function. This will work not only for digits, but for alphabetic characters too.
Yet another Python version:
>>> x="123456789"
>>> out=[x[i:i+3] for i in xrange(0, len(x), 3)]
>>> print "-".join(out)
123-456-789
I think that this can be sanely done in a regex with lookahead:
s/(.{3})(?=.)/$1-/g
Since you mentioned PHP in a comment:
preg_replace ("/(.{3})(?=.)/", "$1-", $string);
edit: After VolkerK showed wordwrap, I found chunk-split in the documentation:
$new_string = chunk_split ($string, 3, '-');
This has the advantage that it also works when there are spaces in the string (wordwrap would prefer to break at the spaces).
In Perl:
#!/usr/bin/perl
use strict;
use warnings;
my $string = "123456789";
$string =~ /(\d{3})(\d{3})(\d+)/;
print "$1-$2-$3"
You can do it with (among other means) a regular expression match and replace. The exact syntax depends on the tool or programming language you are using. For instance, one way to do it in Perl would be
$a = "123456789";
$a =~ s/(\d{3})/$1-/g;
chop($a);
print $a;
Line 2 replaces every 3 digits for the same 3 digits and a dash. With chop() we delete the trailing dash.
There is another question here. What to do when the string doesn't contain a multiple by 3 amount of digits? If such strings were allowed, then the above snippet would need modification.
Also, depending on the specifics of the case, you might get away with simple substring replacement, or string slicing.
One more Perl example. This doesn't remove final groups smaller than three, and it leaves initial groups of less than three digits alone. It's based (pretty shamelessly) on the "money numbers" example in Learning Perl (page 212 of the 5th ed):
#!/usr/bin/env perl
use strict;
use warnings;
print "Gimme' a number: ";
chomp(my $number = <STDIN>);
1 while ($number =~ s/([0-9]{3})([0-9]+)/$1-$2/);
print "Now it's $number\n";

Categories