Extract substring from a certain indexposition of (huge) string

Extract substring from a certain indexposition of (huge) string - php

Let say I have a huge string where I want to extract e certain value belonging to a name, for example the stockprice of Apple.
Let say say the string look like this (in reality its html but that does not matter here)
$output = "nsdfsdnfsnfdnsdfnueruherdfndsdndnjsdnasdnn Apple dndfjnfjdf647tgtgtgeq";
I want to extract the value 647.
The real string is maybe some hundred thousand characters.
I can reveal the position of Apple by:
$str = "Apple";
$pos = strpos($output, $str);
let say the function returns 87310 which is the indexposition of the first letter in Apple.
Here comes my question? Is there an easy way to extract the value when I know the startposition of Apple? I have looked for such a function but can right now not find it.
I could solve this easily by just looping ahead of the name Apple and then extract the relevant characters? But it would at the least save keystrokes to use a function for this instead.
Thanks!!!

To just pull out the stock price, you would want to do something like this:
Search your string for "Apple" and save $position + 5 (length of Apple). Search directly after $position, one character at a time, for the first character that is_numeric and add that to a string, $stock_val. Continue adding all subsequent characters until you find one that !is_numeric. Here is my clunky code:
$position = strpos(strtolower($str), "apple") + strlen("apple");
$temp_str = substr($str, $position);
$stock_val = "";
do {
$char = substr($temp_str, 0, 1); //Take first char of $temp_str
$temp_str = substr($temp_str, 1); //Remove that char from $temp_str
$is_acceptable = (is_numeric($char) || $char == "." || $char == ",");
if($is_acceptable) { //If the char is_numeric, add it to $stock_val
$stock_val .= $char;
}
if(!$is_acceptable && $stock_val != "") {
break; //If the char is NOT numeric AND $stock_val
} //already has characters, break.
} while(strlen($temp_str) > 0); //Repeat while there are still characters

you know the start position so calculate the end position by doing strlen($str) then use substr to cut away the unwanted string
something like this using substr
$portion = substr(substr($string, 0, -(strlen($string) - $end)), $start);

Related

PHP count word frequency with support for punctuation marks

I am trying to get a count of common phrases from a body of text. I don't just want single words, but rather all series of words between any stop words. So for example, https://en.wikipedia.org/wiki/Wuthering_Heights I would like the phrase "wuthering heights" to be counted rather than "wuthering" and "heights".
if (in_array($word, $this->stopwords))
{
$cleanPhrase = preg_replace("/[^A-Za-z ]/", '', $currentPhrase);
$cleanPhrase = trim($cleanPhrase);
if($cleanPhrase != "" && strlen($cleanPhrase) > 2)
{
$this->Phrases[$cleanPhrase] = substr_count($normalisedText, $cleanPhrase);
$currentPhrase = "";
}
continue;
}
else
$currentPhrase = $currentPhrase . $word . " ";
The problem I have with this "age" is being counted if the word "stage" is being used. The solution here is to add whitespace to either side of the $cleanPhrase variable. The problem this leads to then is if there is no white space. There could be a comma, full stop or some other character that signals some kind of punctuation. I want to count all of these. Is there a way I can do this without having to do something like this.
$terminate = array(".", " ", ",", "!", "?");
$count = 0;
foreach($terminate as $tpun)
{
$count += substr_count($normalisedText, $tpun . $cleanPhrase . $tpun);
}

By utilizing this answer with slight modification, you can do this:
$sentence = "Age: In this day and age, people of all age are on the stage.";
$word = 'age';
preg_match_all('/\b'.$word.'\b/i', $sentence, $matches);
\b represents a word boundary. So that string will give a count of 3 if searching for age (the i flag in the pattern means case insensitive, you can remove it if you want to match case as well).
If you're only going to match on one phrase at a time, you'll find your count in count($matches[0]).

Can someone explain to me this 'counting sentences' php code?

I have a task to count sentences without using str_word_count, my senior gave it to me but I am not able to understand. Can someone explain it?
I need to understand the variable and how it works.
<?php
$sentences = "this book are bigger than encyclopedia";
function countSentences($sentences) {
$y = "";
$numberOfSentences = 0;
$index = 0;
while($sentences != $y) {
$y .= $sentences[$index];
if ($sentences[$index] == " ") {
$numberOfSentences++;
}
$index++;
}
$numberOfSentences++;
return $numberOfSentences;
}
echo countSentences($sentences);
?>
The output is
6

It's something very trivial, I'd say.
The task is to count words in a sentence. A sentence is an string (a sequence of characters) that are letters or white spaces (space, new line, etc.)...
Now, what's a word of the sentence? It is a distinct group of letters that "don't touch" other group of letters; meaning words (group of letters) are separated from each other with white space (let's say just a normal blank space)
So the simplest algorithm to count words consist in:
- $words_count_variable = 0
- go through all the characters, one-by-one
- each time you find a space, it means a new word just ended before that, and you have to increase your $words_count_variable
- lastly, you'll find the end of the string, and that means a word just ended before that, so you'll increase for the last time your $words_count_variable
Take "this is a sentence".
We set $words_count_variable = 0;
Your while cycle will analyze:
"t"
"h"
"i"
"s"
" " -> blank space: a word just ended -> $words_count_variable++ (becomes 1)
"i"
"s"
" " -> blank space: a word just ended -> $words_count_variable++ (becomes 2)
"a"
" " -> blank space: a word just ended -> $words_count_variable++ (becomes 3)
"s"
"e"
"n"
...
"n"
"c"
"e"
-> end reached: a word just ended -> $words_count_variable++ (becomes 4)
So, 4.
4 words counted.
Hope this was helpful.

Basicaly, it is just counting the number of space in a sentence.
<?php
$sentences = "this book are bigger than encyclopedia";
function countSentences($sentences) {
$y = ""; // Temporary variable used to reach all chars in $sentences during the loop
$numberOfSentences = 0; // Counter of words
$index = 0; // Array index used for $sentences
// Reach all chars from $sentences (char by char)
while($sentences != $y) {
$y .= $sentences[$index]; // Adding the current char in $y
// If current char is a space, we increase the counter of word
if ($sentences[$index] == " "){
$numberOfSentences++;
}
$index++; // Increment the index used with $sentences in order to reach the next char in the next loop round
}
$numberOfSentences++; // Additional incrementation to count the last word
return $numberOfSentences;
}
echo countSentences($sentences);
?>
Be aware that this function will have wrong results on several case, for example if you have two spaces following, this function will count 2 words instead of one.

Is every letter in the alphabet in a string at least once?

Was wondering if there was a more efficient way to detect if a string contains every letter in the alphabet one or more times using regex?
I appreciate any suggestions
$str = str_split(strtolower('We promptly judged antique ivory buckles for the next prize'));
$az = str_split('abcdefghijklmnopqrstuvwxyz');
$count = 0;
foreach($az as $alph) {
foreach($str as $z) {
if($alph == $z) {
$count++;
break;
}
}
}

Just use array_diff:
count(array_diff($az, $str)) > 0;

With regex you can do that, but it isn't optimal nor fast at all, #hjpotter way if from far faster:
var_dump(strlen(preg_replace('~[^a-z]|(.)(?=.*\1)~i', '', $str)) == 26);
It removes all non letter characters, all duplicate letters (case insensitive), and compares the string length with 26.
[^a-z] matches any non letter character
(.) captures a letter in group 1
(?=.*\1) checks if the same letter is somewhere else (on the right)
the i modifier makes the pattern case insensitive

I don't have any regex answer. But without regex you can try using PHP's count_chars function.
For example:
$test_string = 'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz';
echo count(count_chars($test_string, 1));
Gives you 26 - which is the number of unique chars from $test_string with a frequency greater than zero.

You current program will print pangram for all strings having 26+ alphabets, which means, even aaa... is a pangram.
In your inner loop, you can just break if any character from a-z is not found:
function is_pangram($str) {
if (strlen($str) < 26) return false;
$az = str_split('abcdefghijklmnopqrstuvwxyz');
for ($az as $char) {
if (stripos($str, $char) === false)
return false;
return true;
}
}
A regex is not optimal in this situation. An alternative approach would be using array_map and str_count.

Make an array of Booleans with length 26. Then you can loop through your string just once. In pseudo code (because I don't know PHP):
Boolean b[26]; // Initialized to false
count = 0;
Loop for each char c in string
if (not b[c]) then
++count;
b[c] = true
end
if (count == 26)
break; // All present;
end
end
// If count < 26 then not all present
You need to figure out how to make the character index into the array, but that shouldn't be too hard.

How to cut only the FIRST character in a string

so I have this situation where I want to cut out the first '1.' out of a string, but not any following '1.'s. I am wondering if this is even possible to do.
So I am converting an it to a string, and I am wondering if there is a way to ONLY cut out the initial '1.' and not any following.
So my script dynamically assigns a number, for example 1, 1.1, 1.2, 2, 3, 3.1 - based on certain criteria. And it was currently adding 1. to the beginning of everything. So 1 would = 1.1, 2.1 would = 1.2.1 so on.
Is there a way to force it to ONLY take out the first and not any following? Here is my source:
$str = (string)$i; $str = $i;
$prepend = $parentPrepend ?
$parentPrepend . '.' . $i
: $str = ltrim($str, '\1');
$i++;

The reason your ltrim code doesn't work is that you are passing in \1 which is not the same as the character 1. \1 refers to the character whose ASCII code is 1 which is not the same as 1 whose ASCII code is actually \49.
Modify your code like this:
ltrim($str, '1');
That should trim all 1s from the left of the string.
However, you should know that the ltrim will remove all matching characters from the left of the string, not just the first one!
If you want only the first, then you should use substr instead, with a test to make sure it is a 1.
if(substr($str, 0, 1) == '1')
$str = substr($str, 1);
And if you want to remove the period too, then simply modify the code to include that (and look at first 2 characters instead of only first character)
if (strlen($str) > 2 && substr($str, 0, 2) == '1.')
$str = substr($str, 2);

use strpos to check if 1. is at the beginning. If it is, then use substr to return the string minus the 1.
$string = '1.1';
if (strpos($string, '1.') === 0) {
$string = substr($string, 2);
}
var_dump($string);

You could also use str_replace with a constraint:
$new_string = str_replace ('1.' , '' , $your_string, 1);

php trim a string

I'm trying to build a function to trim a string is it's too long per my specifications.
Here's what I have:
function trim_me($s,$max)
{
if (strlen($s) > $max)
{
$s = substr($s, 0, $max - 3) . '...';
}
return $s;
}
The above will trim a string if it's longer than the $max and will add a continuation...
I want to expand that function to handle multiple words. Currently it does what it does, but if I have a string say: How are you today? which is 18 characters long. If I run trim_me($s,10) it will show as How are yo..., which is not aesthetically pleasing. How can I make it so it adds a ... after the whole word. Say if I run trim_me($s,10) I want it to display How are you... adding the continuation AFTER the word. Any ideas?
I pretty much don't want to add a continuation in the middle of a word. But if the string has only one word, then the continuation can break the word then only.

So, here's what you want:
<?php
// Original PHP code by Chirp Internet: www.chirp.com.au
// Please acknowledge use of this code by including this header.
function myTruncate($string, $limit, $break=".", $pad="...") {
// is $break present between $limit and the end of the string?
if(false !== ($breakpoint = strpos($string, $break, $limit))) {
if($breakpoint < strlen($string) - 1) {
$string = substr($string, 0, $breakpoint) . $pad;
}
}
return $string;
}
?>
Also, you can read more at http://www.the-art-of-web.com/php/truncate/

function trim_me($s,$max) {
if( strlen($s) <= $max) return $s;
return substr($s,0,strrpos($s," ",$max-3))."...";
}
strrpos is the function that does the magic.

I've named the function str_trunc. You can specify strict being TRUE, in which case it will only allow a string of the maximum size and no more, otherwise it will search for the shortest string fitting in the word it was about to finish.
var_dump(str_trunc('How are you today?', 10)); // string(10) "How are..."
var_dump(str_trunc('How are you today? ', 10, FALSE)); // string(14) "How are you..."
// Returns a trunctated version of $str up to $max chars, excluding $trunc.
// $strict = FALSE will allow longer strings to fit the last word.
function str_trunc($str, $max, $strict = TRUE, $trunc = '...') {
if ( strlen($str) <= $max ) {
return $str;
} else {
if ($strict) {
return substr( $str, 0, strrposlimit($str, ' ', 0, $max + 1) ) . $trunc;
} else {
return substr( $str, 0, strpos($str, ' ', $max) ) . $trunc;
}
}
}
// Works like strrpos, but allows a limit
function strrposlimit($haystack, $needle, $offset = 0, $limit = NULL) {
if ($limit === NULL) {
return strrpos($haystack, $needle, $offset);
} else {
$search = substr($haystack, $offset, $limit);
return strrpos($search, $needle, 0);
}
}

It's actually somehow simple and I add this answer because the suggested duplicate does not match your needs (but it does give some pointers).
What you want is to cut a string a maximum length but preserve the last word. So you need to find out the position where to cut the string (and if it's actually necessary to cut it at all).
As getting the length (strlen) and cutting a string (substr) is not your problem (you already make use of it), the problem to solve is how to obtain the position of the last word that is within the limit.
This involves to analyze the string and find out about the offsets of each word. String processing can be done with regular expressions. While writing this, it reminds me on some actually more similar question where this has been already solved:
Extract a fixed number of chars from an array, just full words (with regex)
How to get first x chars from a string, without cutting off the last word? (with wordwrap)
It does exactly this: Obtaining the "full words" string by using a regular expression. The only difference is, that it removes the last word (instead of extending it). As you want to extend the last word instead, this needs a different regular expression pattern.
In a regular expression \b matches a word-boundary. That is before or after a word. You now want to pick at least $length characters until the next word boundary.
As this could contain spaces before the next word, you might want to trim the result to remove these spaces at the end.
You could extend your function like the following then with the regular expression pattern (preg_replace) and the trim:
/**
* Cut a string at length while preserving the last word.
*
* #param string $str
* #param int $length
* #param string $suffix (optional)
*/
function trim_word($str, $length, $suffix = '...')
{
$len = strlen($str);
if ($len < $length) return $str;
$pattern = sprintf('/^(.{%d,}?)\b.*$/', $length);
$str = preg_replace($pattern, '$1', $str);
$str = trim($str);
$str .= $suffix;
return $str;
}
Usage:
$str = 'How are you today?';
echo trim_word($str, 10); # How are you...
You can further on extend this by reducing the minimum length in the pattern by the length of the suffix (as it's somehow suggested in your question, however the results you gave in your question did not match with your code).
I hope this is helpful. Also please use the search function on this site, it's not perfect but many gems are hidden in existing questions for alternative approaches.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extract substring from a certain indexposition of (huge) string - php

you know the start position so calculate the end position by doing strlen($str) then use substr to cut away the unwanted string something like this using substr $portion = substr(substr($string, 0, -(strlen($string) - $end)), $start);

Related

PHP count word frequency with support for punctuation marks

Can someone explain to me this 'counting sentences' php code?

Is every letter in the alphabet in a string at least once?

How to cut only the FIRST character in a string

php trim a string

Categories

Resources