compare array values - php

I have a string like this, which I need to extract the address from:
$string="xyz company 7 th floor hotel yyyy 88 main Road mumbai 400000 this is sample comapny address 9456 and some other";
$word=str_word_count($string,1,'0...9');
Now word has each word like word[0]=xyz, word[1]=company, word[2]=7, etc.
I need to compare each value. If the word is a number then I want to save it in a temp variable until I get another number.
For example word[2] is 7, so I need to save the values from then until 88 in a temp variable. So the temp should contain "7 th floor hotel yyyy 88".
If the temp variable has fewer than 25 characters then we compare until we get another number. So here we need to keep going from 88 to 400000 and append that to the temp variable.
The temp should finally look like this: "7 th floor hotel yyyy 88 main Road mumbai 400000"
Any help please?

The question was already asked here, where I responded. Although preg_match does not follow your thought process, it accomplishes the result you're looking for. The only change you've made between that question and this one is the 25 character restriction. This can easily be resolved by accepting 25 characters of any type before checking for the terminating number:
preg_match('/[0-9]+.{0,25}[^0-9]*[0-9]+\s/',$string,$matches);
return $matches[0];
There is no need to use str_word_count. If you insist on using it, say so in the comments and we can try to accommodate a solution using your thought process. However, preg_match is likely the most efficient way of accomplishing the whole task.

Try using preg_match_all():
if (preg_match_all('!(?<=\b)\d\b+.*\b+\d+(?<=\b)!', $string, $matches)) {
echo $matches[0][0];
}
What this is doing is testing for a sequence of numbers followed by any number of characters followed by another sequence of numbers. The expressions are greedy so the middle pattern (.*) should grab as many as possible meaning you'll be grabbing from the first to the last sets of digits.
There is a lookahead and lookbehind in there to check to see if the numbers are on word boundaries. You may or may not need this and you may or may not need to tweak it depending on your exact requirements.
The above works on the whole string.
If you must (or just prefer) to operate on the words:
$start = false;
$last = false;
$i = 0;
foreach ($words as $word) {
if (is_numeric($word)) {
if ($start === false) {
$start = $i;
}
$last = $i;
}
$i++;
}
$word_range = $words;
array_splice($word_range, $start, $last - $start + 1);
$substring = implode(' ', $word_range);

Related

Extract house numbers from address string

I am importing user data from a foreign database on demand. While i keep house numbers separate from the street names, the other database does not.
I use
preg_match_all('!\d+!')
To rule out the numbers. This works fine for an addressline like this:
streetname 60
But it does not work for an addressline like this:
streetname 60/2/3
In that case i end up extracting 60, and /2/3 stay in the name of the street.
Unfortunately i am not a regex expert. Quite to the contrary. My problem is that i need to be able to not only detect numerics, but also slashes and hyphens.
Can someone help me out here?
Try:
preg_match_all('![0-9/-]+!', 'streetname 60/2/3', $matches);
to give a definite answer we would have to know the patterns in your data.
for example, in Germany we sometimes have house numbers like 13a or 23-42, which could also be written as 23 - 42
one possible solution would be to match everything after a whitespace that starts with a digit
preg_match_all('!\s(\d.*)!', 'streetname 60/2/3', $matches);
this would produce false positives, though, if you have American data with streets like 13street
This approach does not use Regex. Will only return when it sees the first number, exploded by space. Ideal for addresses like e.g. 12 Street Road, Street Name 1234B
function getStreetNumberFromStreetAddress($streetAddress){
$array = explode(' ',$streetAddress);
if (count($array) > 0){
foreach($array as $a){
if (is_numeric($a[0])){
return $a;
}
}
}
return null;
}

PHP phone number parser

Building an application for UK & Ireland only but potentially it might extend to other countries. We have built an API and I'm trying to decided how A) to store phone numbers B) how to write a parser to understand all formats for entry and comparision.
e.g.
Say a user is in Ireland they add a phone number in these formats
0871231234
087 123 1234
087-1231234
+353871231234
Or any other combination of writing a number a valid way. We want to allow for this so a new number can be added to our database in a consistent way
So all the numbers above potentially would be stored as 00353871231234
The problem is I will need to do parsing for all uk as well. Are there any classes out there that can help with this process?
Use regular expressions. An info page can be found here. It should not be too hard to learn, and will be extremely useful to you.
Here is the regular expresssion for validating phone numbers in the United Kingdom:
^((\(?0\d{4}\)?\s?\d{3}\s?\d{3})|(\(?0\d{3}\)?\s?\d{3}\s?\d{4})|(\(?0\d{2}\)?\s?\d{4}\s?\d{4}))(\s?\#(\d{4}|\d{3}))?$
It allows 3, 4 or 5 digit regional prefix, with 8, 7 or 6 digit phone number respectively, plus optional 3 or 4 digit extension number prefixed with a # symbol. Also allows optional brackets surrounding the regional prefix and optional spaces between appropriate groups of numbers. More can be found here.
This Stackoverflow link should help you see how regular expressions can be used with phone numbers internationally.
?php
$array = array
(
'0871231234',
'087 123 1234',
'087-1231234',
'+353871231234'
);
foreach($array as $a)
if(preg_match("/(^[0-9]{10}$)|(^[0-9]{3}\ [0-9]{3}\ [0-9]{4}$)|(^[0-9]{3}\-[0-9]{7}$)|(^\+353[0-9]{9}$)/", $a))
{
// removes +35
$a = preg_replace("/^\+[0-9]{2}/", '', $a);
// removes first number
$a = preg_replace("/^[0-9]{1}/", '', $a);
// removes spaces and -
$a = preg_replace("/(\s+)|(\-)/", '', $a);
$a = "00353".$a;
echo $a."\n";
}
?>
Try http://www.braemoor.co.uk/software/telnumbers.shtml
Design the basic functionality for the UK first add on to it later if needed. You can separate the logic for each country if needed at a later stage. I would tend on the side of cautious optimism, you want to be accepting as many numbers as possible?
Strip out spaces, opening and closing brackets and -
If number starts with a single 0 replace with 00
If number starts with a + replace with a 00
If it is numeric and has a total length of between 9 and 11 characters we are 'good'
As for storage you could store it as a string... or as an integer, with a second field that contains the Qty of prefix '0's
Use this for reference
http://en.wikipedia.org/wiki/Telephone_numbers_in_the_United_Kingdom

Read corresponding value in PHP and add to running sum

I would like to have each word in a string cross-referenced in a file.
So, if I was given the string: Jumping jacks wake me up in the morning.
I use some regex to strip out the period. Also, the entire string is made lowercase.
I then go on to have the words separated into an array by using PHP's nifty explode() function.
Now, what I'm left with, is an array with the words used in the string.
From there I need to look up each value in the array and get a value for it and add it to a running sum. for() loop it is. Okay, this is where I get stuck...
The list ($wordlist) is structured like so:
wake#4 waking#3 0.125
morning#2 -0.125
There are \ts in between the word and the number. There can be more than one word per value.
What I need the PHP to do now is look up the number to each word in the array then pull that corresponding number back to add it to a running sum. What's the best way for me to go about this?
The answer should be easy enough, just finding the location of the string in the wordlist and then finding the tab and from there reading the int... I just need some guidance.
Thanks in advance.
EDIT: to clarify -- I don't want the sum of the values of the wordlist, rather, I'd like to look up my individual values as they correspond to the words in the sentence and THEN look them up in the list and add just those values; not all of them.
Edited answer based on your comment and question edit. The running sum is stored in an array called $sum where the key value of the "word" will store the value of its running sum. e.g $sum['wake'] will store the running sum for the word wake and so on.
$sum = array();
foreach($wordlist as $word) //Loop through each word in wordlist
{
// Getting the value for the word by matching pattern.
//The number value for each word is stored in an array $word_values, where the key is the word and value is the value for that word.
// The word is got by matching upto '#'. The first parenthesis matches the word - (\w+)
//The word is followed by #, single digit(\d), multiple spaces(\s+), then the number value(\S+ matches the rest of the non-space characters)
//The second parenthesis matches the number value for the word
preg_match('/(\w+)#\d\s+(\S+)/', $word, $match);
$word_ref = $match[1];
$word_ref_number = $match[2];
$word_values["$word_ref"] = $word_ref_number;
}
//Assuming $sentence_array to store the array of words used in your string example {"Jumping", "jacks", "wake", "me", "up", "in", "the", "morning"}
foreach ($sentence_array as $word)
{
if (!array_key_exists("$word", $sum)) $sum["$word"] = 0;
$sum["$word"] += $word_values["$word"];
}
Am assuming you would take care of case sensitivities, since you mentioned that you make the entire string lowercase, so am not including that here.
$sentence = 'Jumping jacks wake me up in the morning';
$words=array();
foreach( explode(' ',$sentence) as $w ){
if( !array_key_exists($w,$words) ){
$words[$w]++;
} else {
$words[$w]=1;
}
}
explodeby space, check if that word is in the words array as key; if so increment it's count(val); if not, set it's val as 1. Loop this for each of your sentences without redeclaring the $words=array()

How to count sentences in <textarea>?

I have a textarea on page with UTF8 encoding.
How to count all sentences with php?
Update:
Sentence starts with a capital letter and ending by dot, question or exclamation mark.
From PHP's point of view, a <textarea> is simply another <input>, so it will be available through $_GET or $_POST as normal when the form is submitted.
Sentence counting in itself is quite complicated - you could count the number of sentences by the number of periods (.) in the text, but this would fail with abbreviations e.g. e.g.. You could do so by counting the number of periods followed by a space and then a capital letter, but this would fail for abbreviations followed by common nouns, and also for people who don't use capital letters at the beginning of their sentences. You could decide an average sentence length (say 70 characters) and approximate sentences = characters/70. None of these solutions are perfect (or even good, in my opinion).
UPDATE: Following your updated question, the following should be helpful:
<?php
preg_match_all("/(^|[.!?])\s*[A-Z]/",$_POST['textarea'],$matches);
$count = count($matches);
As Nobody was saying already, it depends on how you define a sentence. Is it a ? Is it a linebreak? Is it a capital?
I think it's really hard to define "a sentence", because for every definition you can think of 100 exceptions to that rule.
Anyway, if you come up with a definition, you could thus count the occurences of that in your textarea. Such as the number of linebreaks, the number of dots or the number of capital letters. Or combine all of those into one definition. So basically, just take the contents of your textarea and process some function on it. :-)
That's the best that can be answered to this question imo.
Edit After your edit my answer is:
function starts_with_upper($str) {
$chr = mb_substr ($str, 0, 1, "UTF-8");
return mb_strtolower($chr, "UTF-8") != $chr;
}
//Get sentences splitted by a dot and starting with a capital letter.
$total = 0;
$sentences = explode('.', rtrim($text, '.'));
for ($i = 0; $i < count($sentences); $i++) {
$sentence = $sentences[i];
if (starts_with_upper($sentence)) {
$total++;
}
}
echo "You have " . $total . " sentences ending in a dot.
If you treat sentence as a piece of words with dot at the end you can count dots in your text.
If you use new line, count \n's.

PHP & word counting from string

Trying to take numbers from a text file and see how many times they occur.
I've gotten to the point where I can print all of them out, but I want to display just the number once, and then maybe the occurrences after them (ie: Key | Amount; 317 | 42).
Not looking for an Answer per se, all learning is good, but if you figure one out for me, that would be awesome as well!
preg_match_all will return the number of matches against a string.
$count = preg_match_all("#$key#", $string);
print "{$key} - {$count}";
So if you're already extracting the data you need, you can do this using a (fairly) simple array:
$counts = array();
foreach ($keysFoundFromFile AS $key) {
if (!$counts[$key]) $counts[$key] = 0;
$counts[$key]++;
}
print_r($counts);
If you're already looping to extract the keys from the file, then you can simply assign them directly to the $counts array without making a second loop.
I think you're looking for the function substr_count().
Just a heads up though, if you're looking for "123" and it finds "4512367" it will match it as part of it. The alternative would be using RegEx and using word boundaries:
$count = preg_match_all('|\b'. preg_quote($num) .'\b|', $text);
(preg_quote() for good practice, \b for word boundaries so we can be assured that it's not a number embedded in another number.)

Categories