Using preg_replace to reformat money amounts in text with PHP - php

I'm struggling with some regular expressions. What I want to do is find money amounts in a string, remove the €,$, or £ but keep the number, and then look to see if there is a 'b' or an 'm' - in which case write 'million platinum coins' or 'million gold coin' respectively but otherwise just put 'gold coins'.
I have most of that as a hack (see below) with the small problem that my regex does not seem to work. The money amount comes out unchanged.
Desired behaviour examples
I intend to leave the decimal places and thousands separators as is
$12.6m ==> 12.6 million gold coins
£2b ==> 2 million platinum coins
€99 ==> 99 gold coins
My code
Here is my non-working code (I suspect my regex might be wrong).
protected function funnymoney($text){
$text = preg_replace('/[€$£]*([0-9\.,]+)([mb])/i','\0 %\1%',$text);
$text = str_replace('%b%','million platnum coins',$text);
$text = str_replace('%m%','million gold coins',$text);
$text = str_replace('%%','gold coins',$text);
return $text;
}
I would greatly appreciate it if someone could explain to me what I am missing or getting wrong and guide me to the right answer. You may safely assume I know very little about regular expressions. I would like to understand why the solution works too if I can.

Using preg_replace_callback, you can do this in a single function call:
define ("re", '/[€$£]*(\.\d+|\d+(?:[.,]\d+)?)([mb]|)/i');
function funnymoney($text) {
return preg_replace_callback(re, function($m) {
return $m[1] .
($m[2] != "" ? " million" : "") . ($m[2] == "b" ? " platinum" : " gold") .
" coins";
}, $text);
}
// not call this function
echo funnymoney('$12.6m');
//=> "12.6 million gold coins"
echo funnymoney('£2b');
//=> "2 million platinum coins"
echo funnymoney('€99');
//=> "99 gold coins"

I am not sure how you intend to handle decimal places and thousands separators, so that part of my pattern may require adjustment. Beyond that, match the leading currency symbol (so that it is consumed/removed, then capture the numeric substring, then capture the optional trailing units (b or m).
Use a lookup array to translate the units to English. When the unit character is missing, apply the fallback value from the lookup array.
A lookup array will make your task easier to read and maintain.
Code: (Demo)
$str = '$1.1m
Foo
£2,2b
Bar
€99.9';
$lookup = [
'b' => 'million platinum coins',
'm' => 'million gold coins',
'' => 'gold coins',
];
echo preg_replace_callback(
'~[$£€](\d+(?:[.,]\d+)?)([bm]?)~iu',
function($m) use ($lookup) {
return "$m[1] " . $lookup[strtolower($m[2])];
},
$str
);
Output:
1.1 million gold coins
Foo
2,2 million platinum coins
Bar
99.9 gold coins

Your regex has a first full match on the string, and it goes on index 0 of the returning array, but it seems you just need the capturing groups.
$text = preg_replace('/[€$£]*([0-9\.,]+)([mb])/i','\1 %\2%',$text);
Funny question, btw!

Is this what you want?
<?php
/**
$12.6m ==> 12.6 million gold coins
£2b ==> 2 million platinum coins
€99 ==> 99 gold coins
*/
$str = <<<EOD
$12.6m
£2b
€99
EOD;
preg_match('/\$(.*?)m/', $str, $string1);
echo $string1[1] . " million gold coins \n";
preg_match('/\£(.*?)b/', $str, $string2);
echo $string2[1] . " million platinum coins \n";
preg_match('/\€([0-9])/', $str, $string3);
echo $string3[1] . " gold coins \n";
// output:
// 12.6 million gold coins
// 2 million platinum coins
// 9 gold coins

Related

PHP str_replace with an offset

I have the following output:
Item
Length : 130
Depth : 25
Total Area (sq cm): 3250
Wood Finish: Beech
Etc: etc
I want to remove the Total Area (sq cm): and the 4 digits after it from the string, currently I am trying to use str_replace like so:
$tidy_str = str_replace( $totalarea, "", $tidy_str);
Is this the correct function to use and if so how can I include the 4 random digits after this text? Please also note that this is not a set output so the string will change position within this.
You can practice php regex at http://www.phpliveregex.com/
<?php
$str = '
Item
Length : 130
Depth : 25
Total Area (sq cm): 3250
Wood Finish: Beech
Etc: etc
';
echo preg_replace("/Total Area \(sq cm\): [0-9]*\\n/", "", $str);
Item
Length : 130
Depth : 25
Wood Finish: Beech
Etc: etc
This will do it.
$exp = '/\(sq cm\): \d+/';
echo preg_replace($exp, '', $array);
Try with this:
preg_replace('/(Total Area \(sq cm\): )([0-9\.,]*)/' , '', $tidy_str);
You are looking for substr_replace:
$strToSearch = "Total Area (sq cm):";
$totalAreaIndex = strpos($tidy_str, $strToSearch);
echo substr_replace($tidy_str, '', $totalAreaIndex, strlen($strToSearch) + 5); // 5 = space plus 4 numbers.
If you want to remove the newline too, you should check if it's \n or \r\n. \n add one, \r\n add two to offset. Ie. strlen($strToSearch) + 7

Solving 140 characters Twitter status limit with PHP regex

So, my text I want to post on Twitter is sometimes more than 140 character, so, I need to check the lenght and then go without changes if less than 140 or slive the text into two pieces (the text and the link) and grab the text part and make it e.g. 100 characters long - chop the rest.
Then grab the - now 100 characters long part - and put it otgether with the url.
How to do that?
my code so far:
if (strlen($status) < 140) {
// continue
} else {
// 1. slice the $status into $text and $url (every message has url so
// checking is not important right now
// 2. shorten the text to 100 char
// something like $text = substr($text, 0, 100); ?
// 3. put them back together
$status = $text . ' ' . $url;
}
How should I change my code? I have biggest problem with the first part when getting the url and text part.
Btw. in each $status is only 1 url, so checking for mulitple urls is not necessary
Example of a text that is longer than it should be:
What is now Ecuador was home to a variety of indigenous groups that were gradually incorporated into the Inca Empire during the fifteenth century. The territory was colonized by Spain during the sixteenth century, achieving independence in 1820 as part of Gran Colombia, from which it emerged as its own sovereign state in 1830. The legacy of both empires is reflected in Ecuador's ethnically diverse population, with most of its 15.2 million people being mestizos, followed by large minorities of European, Amerindian, and African descendant. https://en.wikipedia.org/wiki/Ecuador
should become in the end this:
What is now Ecuador was home to a variety of indigenous groups that were gradually incorporated int https://en.wikipedia.org/wiki/Ecuador
If you can be sure that the URL does not contain any spaces (no well-formed URL should) and that it is always present, try it like that:
preg_match('/^(.*)(\\S+)$/', $status, $matches);
$text = $matches[1];
$url = $matches[2];
$text = substr($text, 0, 100);
But possibly the length of the text should be adapted to the length of the url, so you would use
$text = substr($text, 0, 140-strlen($url)-1);
$reg = '/\b(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[-A-Z0-9+&##\/%=~_|$?!:,.]*[A-Z0-9+&##\/%=~_|$]/i';
$string = "What is now Ecuador was home to a variety of indigenous groups that were gradually incorporated into the Inca Empire during the fifteenth century. The territory was colonized by Spain during the sixteenth century, achieving independence in 1820 as part of Gran Colombia, from which it emerged as its own sovereign state in 1830. The legacy of both empires is reflected in Ecuador's ethnically diverse population, with most of its 15.2 million people being mestizos, followed by large minorities of European, Amerindian, and African descendant. https://en.wikipedia.org/wiki/Ecuador";
preg_match_all($reg, $string, $matches, PREG_PATTERN_ORDER);
$cut_string = substr($string, 0, (140-strlen($matches[0][0])-1));
$your_twitt = $cut_string . " " . $matches[0][0];
echo $your_twitt;
// ouputs : "What is now Ecuador was home to a variety of indigenous groups that were gradually incorporated into t https://en.wikipedia.org/wiki/Ecuador"
This might be what you want :
$status = 'What is now Ecuador was home to a variety of indigenous groups that were gradually incorporated into the Inca Empire during the fifteenth century. The territory was colonized by Spain during the sixteenth century, achieving independence in 1820 as part of Gran Colombia, from which it emerged as its own sovereign state in 1830. The legacy of both empires is reflected in Ecuador\'s ethnically diverse population, with most of its 15.2 million people being mestizos, followed by large minorities of European, Amerindian, and African descendant. https://en.wikipedia.org/wiki/Ecuador';
if (strlen($status) < 140) {
echo 'Lenght ok';
} else {
$totalPart = round(strlen($status)/100);
$fulltweet = array();
for ($i=0; $i < $totalPart; $i++) {
if($i==0)
{
$fulltweet[$i] = substr($status, 0,100);
}else{
$fulltweet[$i] = substr($status, $i * 100);
}
}
}
If the string is longer than 140 chars then it'll explode it in an array of 100 char for each row

Regex PHP, Find characters in specific position

I explain my problem : I'm working on different kind of address
" 25 Down Street 15000 London "
" 25 B Down Street 15000 London "
" Building A 25 Down Street 15000 London "
I found a way to determine which is the number of the street on all case with this regex :
`^([1-9][0-9]{0,2}(?:\s*[A-Z])?)\b`
But now i got a problem that i can't solve, i need when the case is real to determine characters which are before the street's number .
Example : " Building 2 25 Down Street 15000 London " i need here to find only "Building 2"
I understand that i have to find characters before the first number of this string.
Keep searching on my own but will be great if someone got a solution for me .
Thank you .
Edit my code now is :
preg_match('/^(.*?)\d+\s+\D+/', $cleanAdressNode, $result, PREG_OFFSET_CAPTURE,0);
print $result[0][0];
return $result[0][0];
and the result now is : Résidence Les Thermes 1 15 boulevard Jean Jaurès instead of only : Résidence Les Thermes 1
How about:
preg_match('/^(\D*)/', $str, $match);
You will find in $match[1] everything that is not a digit at the begining of the string.
According to your example:
preg_match('/^(.*?)\d+\s+\D+/', $str, $match);
If you only want to match the first non-numeric characters, ^([^0-9]*) should do the trick. It uses class negation to grab every non-numeric characters at the start of the string.

Find a pattern within two or more sets of text

I have lots of data that I need to search through for certain patterns.
Problem is when looking for said patterns I have no reference to what I'm looking for.
Or in other words, I have two paragraphs. Each on similar topics. I need to be able to compare both paragraphs and find patterns. Phrases said in both paragraphs and how many times both were said.
Can't seem to find the solution because preg_match and other functions your required to supply the things your looking for.
Example paragraphs
Paragraph 1:
Bee Pollen is made by honeybees, and is the food of the young bee. It
is considered one of nature's most completely nourishing foods as it
contains nearly all nutrients required by humans. Bee-gathered pollens
are rich in proteins (approximately 40% protein), free amino acids,
vitamins, including B-complex, and folic acid.
Paragraph 2:
Bee Pollen is made by honeybees. It is required for the fertilization
of the plant. The tiny particles consist of 50/1,000-millimeter
corpuscles, formed at the free end of the stamen in the heart of the
blossom, nature's most completely nourishing foods. Every variety of
flower in the universe puts forth a dusting of pollen. Many orchard
fruits and agricultural food crops do, too.
So from those examples these patterns:
Bee Pollen is made by honeybees
and:
nature's most completely nourishing foods
Both phrases are found in both paragraphs.
This is potentially a complex question depending on whether you're looking for similar phrases or phrases that match word for word.
Finding exact word-for-word matches is quite simple all you need to do is split on common breaks like punctuation marks (e.g. .,;:) and perhaps on conjunctions as well (e.g. and or). However, the problem comes when you come to, for example, adjectives two phrases might be exactly the same but have one word different, like so:
The world is spinnnig around its axis at a tremendous speed.
The world is spinning around its axis at a magnificent speed.
This won't match because tremendous and magnificent are used in place of one another. Potentially you could work around this, however, that would be a more complex question.
Answer
If we stick to the simple side of things we can achieve phrase matching with just a few lines of code (4 in this example; not including the formatting for comments/readability).
$wordSplits = 'and or on of as'; //List of words to split on
preg_match_all('/(?<m1>.*?)([.,;:\-]| '.str_replace(' ', ' | ', trim($wordSplits)).' )/i', $para1, $matches1);
preg_match_all('/(?<m2>.*?)([.,;:\-]| '.str_replace(' ', ' | ', trim($wordSplits)).' )/i', $para2, $matches2);
$commonPhrases = array_filter( //Removes blank $key=>$value pairs
array_intersect( //Finds matching paterns
array_map(function($item){
return(strtolower(trim($item))); //Cleans array for $para1 values - removes leading and following spaces
}, $matches1['m1']),
array_map(function($item){
return(strtolower(trim($item))); //Cleans array for $para2 values - removes leading and following spaces
}, $matches2['m2'])
)
);
var_dump($commonPhrases);
/**
OUTPUT:
array(2) {
[0]=>
string(31) "bee pollen is made by honeybees"
[5]=>
string(41) "nature's most completely nourishing foods"
}
/*
The above code will find matches splitting both on punctuation (defined in [...] of the preg_match_all pattern) it will also concatenate the word list (matching only words in the word list with a preceding and following space).
Wordlist
You can change the word list to include any breaks you like, editing the list until you get the phrases you are after, examples:
$wordSplits = 'and or';
$wordSplits = 'and but if or';
$wordSplits = 'a an as and by but because if in is it of off on or';
Punctuation
You can add any punctuation marks you like into the list (between [ and ]), however remember that some characters do have special meanings and may need to be escaped (or placed appropriately): - and ^ should become \- and \^ or be placed where their special meaning doesn't come into play.
You may consider changing:
([.,;:\-]|
To:
([.,;:\-] | //Adding a space before the pipe
So that you only split punctuation marks which are followed by a space. For example: this would mean that items like 50,000 won't be split.
Spaces and breaks
You may also consider changing the spaces to \s so that tabs and newlines etc are included and not just spaces. Like so:
'/(?<m1>.*?)([.,;:\-]|\s'.str_replace(' ', '\s|\s', trim($wordSplits)).'\s)/i'
This would also apply to:
([.,;:\-]\s|
If you decide to go down that route.
I've been working on this code, don't know if it suits your needs... Feel free to expand it!
$p1 = "Bee Pollen is made by honeybees, and is the food of the young bee. It is considered one of nature's most completely nourishing foods as it contains nearly all nutrients required by humans. Bee-gathered pollens are rich in proteins (approximately 40% protein), free amino acids, vitamins, including B-complex, and folic acid.";
$p2 = "Bee Pollen is made by honeybees. It is required for the fertilization of the plant. The tiny particles consist of 50/1,000-millimeter corpuscles, formed at the free end of the stamen in the heart of the blossom, nature's most completely nourishing foods. Every variety of flower in the universe puts forth a dusting of pollen. Many orchard fruits and agricultural food crops do, too.";
// Strip strings of periods etc.
$p1 = strtolower(str_replace(array('.', ',', '(', ')'), '', $p1));
$p2 = strtolower(str_replace(array('.', ',', '(', ')'), '', $p2));
// Extract words from first paragraph
$w1 = explode(" ", $p1);
// Build search string
$search = '';
$found = array();
foreach ($w1 as $word) {
//echo 'Word: ' . $word . "<br />";
$search .= ' ' . $word;
$search = trim($search);
//echo '. . Search string: '. $search . "<br /><br />";
if (substr_count($p2, $search)) {
$old_search = $search;
$num_occured = substr_count($p2, $search);
//echo " . . . found!" . "<br /><br /><br />";
$add = TRUE;
} else {
//echo " . . . not found! Generating new search string: " . $word . '<br />';
if ($add) {
$found[] = array('pattern' => $old_search, 'occurences' => $num_occured);
$add = FALSE;
}
$old_search = '';
$search = $word;
}
}
print_r($found);
The above code finds occurences of patterns from the first string in the second one.
I'm sure it can be written better, but since it's past midnight (local time), I'm not as "fresh" as I'd like to be...
Codepad-link

Regular Expression - PHP

I am using following PHP code
<?
$data = file_get_contents('http://www.kitco.com/texten/texten.html');
preg_match_all('/([A-Z]{3,5}\s+[0-9]{1,2},[0-9]{4}\s+([0-9.NA]{2,10}\s+){1,7})/si',$data,$result);
$records = array();
foreach($result[1] as $date) {
$temp = preg_split('/\s+/',$date);
$index = array_shift($temp);
$index.= array_shift($temp);
$records[$index] = implode(',',$temp);
}
print_R($records);
?>
To READ the following data
--------------------------------------------------------------------------------
London Fix GOLD SILVER PLATINUM PALLADIUM
AM PM AM PM AM PM
--------------------------------------------------------------------------------
Jun 03,2013 1396.75 1402.50 22.4300 1466.00 1487.00 749.00 755.00
May 31,2013 1410.25 1394.50 22.5700 1471.00 1459.00 755.00 744.00
--------------------------------------------------------------------------------
What i want to do is Read GOLD ( BID & ASK ) price from below table, can anyone help in the regular expression changes?
New York Spot Price
MARKET IS CLOSED
Will open in
----------------------------------------------------------------------
Metals Bid Ask Change Low High
----------------------------------------------------------------------
Gold 1411.20 1412.20 +22.90 +1.65% 1390.10 1418.00
Silver 22.74 22.84 +0.48 +2.13% 22.26 23.08
Platinum 1495.00 1501.00 +41.00 +2.82% 1470.00 1511.00
Palladium 756.00 761.00 +7.00 +0.93% 750.00 766.00
----------------------------------------------------------------------
Last Update on Jun 03, 2013 at 17:14.58
----------------------------------------------------------------------
I'm not sure you could modify your existing regex to match both tables easily, but if you had the second table in a string, you could use:
$string = "PLAIN TEXT TABLE DATA HERE";
preg_match('/Gold\s+(\d+\.\d{2})\s+(\d+\.\d{2})/',$string,$matches);
$goldBid = $matches[1];
$goldAsk = $matches[2];
Here I'm only matching the numbers and period character. This code should return the numbers you're looking for. It uses your data string from your example.
<?
preg_match_all('!Gold\s+([0-9.]+)\s+([0-9.]+)!i',$data,$matches);
//New York
$ny_bid = $matches[1][0];
$ny_ask = $matches[2][0];
print("NY\nbid: $ny_bid\n");
print("ask: $ny_ask\n\n");
//Asia
$asia_bid = $matches[1][1];
$asia_ask = $matches[2][1];
print("Asia\nbid: $asia_bid\n");
print("ask: $asia_ask\n");
?>
Output
NY
bid: 1411.20
ask: 1412.20
Asia
bid: 1406.80
ask: 1407.80
You can also use T-Regx library
<?php
pattern('Gold\s+([0-9.]+)\s+([0-9.]+)', 'i')->match($data)->forEach(function ($m) {
print 'bid: ' . $m->group(1);
print 'ask: ' . $m->group(2);
});

Categories