Preg_replace not working as wanted - php

Basically i have the following text stored in $text var :
$text = 'An airplane accelerates down a runway at 3.20 m/s2 for 32.8 s until is finally lifts off the ground. Determine the distance traveled before takeoff'.
I have a function that replaces some keywords on the text from an array named $replacements which is (I did a var_dump on it) :
'm' => string 'meter' (length=5)
'meters' => string 'meter' (length=5)
's' => string 'second' (length=6)
'seconds' => string 'second' (length=6)
'n' => string 'newton' (length=6)
'newtons' => string 'newton' (length=6)
'v' => string 'volt' (length=4)
'speed' => string 'velocity' (length=8)
'\/' => string 'per' (length=3)
's2' => string 'secondsquare' (length=12)
The text goes through the following function :
$toreplace = array_keys($replacements);
foreach ($toreplace as $r){
$text = preg_replace("/\b$r\b/u", $replacements[$r], $text);
}
However, there is a difference between what I expect and the output :
Expected Output : an airplane accelerates down runway at 3.20 meterpersecondsquare for 32.8 second until finally lifts off ground determine distance traveled before takeoff
Function Output : an airplane accelerates down runway at 3.20 meterpers2 for 32.8 second until finally lifts off ground determine distance traveled before takeoff
Notice that I expect 'meterpersecondsquare' and I get 'meterpers2' (the 's2' isn't replaced) while the 'm' and '/' were replaced with their values.
I noticed that when I put m/s instead of m/s2 it works fine and gives :
an airplane accelerates down runway at 3.20 meterpersecond for 32.8 second until finally lifts off ground determine distance traveled before takeoff
So the problem is basically it doesn't match that s2. Any thoughts why is it the case?

Move the s2 replacement before the s replacement.
Since you are doing the replacement one at a time, you are destroying the s2 before it gets a chance to replace it.
3.20 m/s2 will be transformed like this
[m] 3.20 meter/s2
[s] 3.20 meter/second2
[/] 3.20 meterpersecond2
Which results in meterpersecond2
Here is the proper order
'm' => string 'meter' (length=5)
'meters' => string 'meter' (length=5)
's2' => string 'secondsquare' (length=12)
's' => string 'second' (length=6)
'seconds' => string 'second' (length=6)
'n' => string 'newton' (length=6)
'newtons' => string 'newton' (length=6)
'v' => string 'volt' (length=4)
'speed' => string 'velocity' (length=8)
'\/' => string 'per' (length=3)

Related

More complicated range generation

Producing a range with PHP is easy when the range is something like 1 to 100 or A to Z. But I need to be able to produce ranges like 101A to 101Z or A1 to A100.
I thought that maybe PHP has a function to compare two strings, strip what's common between them and return the rest to form the range boundaries. However I can not find such a function. How would I achieve this?
EDIT: I don't have control over the format, I can only set the guidelines. The end user determines the pattern by entering something like A1-A100 into an input field.
101A to 101Z is like "101" + range("A", "Z")
and
A1 to A100 is like "A" + range(1, 100)
If you're looking for A1 to Z100 that's when things get a bit more complicated.
You can look at the way functions like base converters work, like dechex and base64_encode
If you're converting from the decimal system to your own notation you can do thing kind of conversion.
1 => A1
2 => A2
101 => B1
102 => B2
2601? => Z1
2700? => Z100
This just describes the outline. If you want code you'll have to make your question more clear.
Arbitrary ranges is .. very hard. I don't know of a solution. A1-A100 has a clear solution, but what about A1-100Z, how do you even begin? What about small-large or Boston-New York?
You can try with:
function rangeFix($from, $to, $prefix = null, $suffix = null) {
return array_map(function($item) use ($prefix, $suffix){
return $prefix . $item . $suffix;
}, range($from, $to));
}
rangeFix(0, 10, 'A', 'Z');
Output:
array (size=11)
0 => string 'A0Z' (length=3)
1 => string 'A1Z' (length=3)
2 => string 'A2Z' (length=3)
3 => string 'A3Z' (length=3)
4 => string 'A4Z' (length=3)
5 => string 'A5Z' (length=3)
6 => string 'A6Z' (length=3)
7 => string 'A7Z' (length=3)
8 => string 'A8Z' (length=3)
9 => string 'A9Z' (length=3)
10 => string 'A10Z' (length=4)
or:
rangeFix('A', 'Z', 101);
Output:
array (size=26)
0 => string '101A' (length=4)
1 => string '101B' (length=4)
2 => string '101C' (length=4)
3 => string '101D' (length=4)
4 => string '101E' (length=4)
5 => string '101F' (length=4)
6 => string '101G' (length=4)
7 => string '101H' (length=4)
8 => string '101I' (length=4)
9 => string '101J' (length=4)
10 => string '101K' (length=4)
11 => string '101L' (length=4)
12 => string '101M' (length=4)
13 => string '101N' (length=4)
14 => string '101O' (length=4)
15 => string '101P' (length=4)
16 => string '101Q' (length=4)
17 => string '101R' (length=4)
18 => string '101S' (length=4)
19 => string '101T' (length=4)
20 => string '101U' (length=4)
21 => string '101V' (length=4)
22 => string '101W' (length=4)
23 => string '101X' (length=4)
24 => string '101Y' (length=4)
25 => string '101Z' (length=4)
This is my own, not so elegant solution. It works with the following logic:
Range boundaries have two parts, of which the other is digit and the other non-digit, such as A1 or 1A. One-part string work too, such as A or 1. I did not test with strings such as A1B where there are more than two parts. The script probably fails there.
$s1 = '101A';
$s2 = '101Z';
$s1_d = preg_split('/\d+/', $s1);
$s1_D = preg_split('/\D+/', $s1);
$s2_d = preg_split('/\d+/', $s2);
$s2_D = preg_split('/\D+/', $s2);
if($s1_d[0] == '') $s1_d[0] = $s1_D[0];
else $s1_d[1] = $s1_D[1];
$s1 = $s1_d;
if($s2_d[0] == '') $s2_d[0] = $s2_D[0];
else $s2_d[1] = $s2_D[1];
$s2 = $s2_d;
$prefix = false;
$postfix = false;
if($s1[0] == $s2[1]) die(); // Can't do it.
if($s1[0] == $s2[0]) {
$prefix = $s1[0];
$start = $s1[1];
$end = $s2[1];
}
else {
$postfix = $s1[1];
$start = $s1[0];
$end = $s2[0];
}
$range = range($start, $end);
foreach($range as &$r) {
$r = $prefix . $r . $postfix;
}
var_dump($range);

What array function should I use for creating an index?

Hello guys I am trying to create an index of all words on html page that my crawler parses.
At this moment I have managed to breakdown the html page into an array of words and I have filtered out all the stop words.
At this stage I have a few problems.
The array of words from the parsed html page have words that are repeated, I like that because I still have to record how many times a word appeared in the page.
The array looks like this.
$wordsFromHTML =
array (size=119)
0 => string 'web' (length=3)
1 => string 'giants' (length=6)
2 => string 'vryheid' (length=7)
3 => string 'news' (length=4)
4 => string 'access' (length=6)
5 => string 'mails' (length=5)
6 => string 'mobile' (length=6)
7 => string 'february' (length=8)
8 => string 'access' (length=6)
9 => string 'mails' (length=5)
10 => string 'web' (length=3)
11 => string 'february' (length=8)
12 => string 'access' (length=6)
13 => string 'mails' (length=5)
14 => string 'desktop' (length=7)
15 => string 'february' (length=8)
16 => string 'hosting' (length=7)
17 => string 'web' (length=3)
18 => string 'giants' (length=6)
19 => string 'vryheid' (length=7)
20 => string 'february' (length=8)
22 => string 'us' (length=2)
Now I want to save all the words from the $wordsFromHTML to the $indesArray which is my final index.
It should look like this.
$indexArray = array('web'=>array('url'=>array(0,10,17)))
The problem is how to keep incrementing the position ($wordsFromHTML keys) for each word that was repeated from the $wordsFromHTML array in the final index array.
The index array should only have unique words and if another word that already exists try to come in, we use the already existing word which has the same URL and increment its position.
Hope you understand my question.

SimpleHTMLDom iteration through table

I am using SimpleHTMLDOM to get information from my school roster. The problem is that the table structure is pretty hard to parse and I am looking for some help.
The table looks like this:
http://pastebin.com/xg3mRAHw
The code looks like this:
http://pastebin.com/gWW7WyDA
The result looks like this (also included how I want the result to look like!):
Current format:
array
3 =>
array
'28-11-2011' =>
array
0 => string '08.45-10.30 ' (length=12)
1 => string 'CMD-1 HC interaction design' (length=27)
2 => string 'CMD-1vt-p2.01 - CMD-1vt-p2.18 ' (length=30)
3 => string 'OVk45' (length=5)
4 => string 'J.P. van Leeuwen' (length=16)
5 => string '10.30-12.15 ' (length=12)
6 => string 'CMD-1 Training samenwerken' (length=26)
7 => string 'CMD-1vt-p2.09 - CMD-1vt-p2.10 ' (length=30)
8 => string 'SL433' (length=5)
9 => string 'B. Hartman' (length=10)
Wanted format:
array
3 =>
array
'28-11-2011' =>
array
0 =>
array
'time' => string '08.45-10.30 ' (length=12)
'name' => string 'CMD-1 HC interaction design' (length=27)
'group' => string 'CMD-1vt-p2.01 - CMD-1vt-p2.18 ' (length=30)
'place' => string 'OVk45' (length=5)
'teacher' => string 'J.P. van Leeuwen' (length=16)
1 =>
array
'time' => string '10.30-12.15 ' (length=12)
'name' => string 'CMD-1 Training samenwerken' (length=26)
'group' => string 'CMD-1vt-p2.09 - CMD-1vt-p2.10 ' (length=30)
'place' => string 'SL433' (length=5)
'teacher' => string 'B. Hartman' (length=10)
The problem is that I do not understand how I can get to this result using (only) SimpleHTMLDOM. I am sure that I'm missing something here because I'm close to the final markup of the array. The last step to have it actually show up like the future example is something I cannot get to work.
Could someone give me a few tips on how to proceed and get the array like the way I want it to? I have been looking at XSL too but that is far too complicated for me at this point.
You need to segment the tr array as well.
$count = 0;
foreach ($table as $tr) {
...
$output[$info['week']][$info['date']][$count] = array();
$count++;
...
$output[$info['week']][$info['date']][$count][] = $td->innertext;
Now as for the 'time', 'name', 'group' etc. values, I don't see those anywhere in the xml, so I guess you will just have to maintain an inner count when appending td->innertext.

How to filter a dirty array into a clean stream of data

I have here what I would call a dirty array,
this dirty array needs to be filtered so it is a clean array e.g.
Below is the Array.
array
0 => string '1' (length=1)
1 => string 'FIRSTNAME A' (length=7)
2 => string 'LASTNAME B' (length=10)
3 => string '2011-12-08 16:15:37' (length=19)
4 => string '2' (length=1)
5 => string 'FIRSTNAME B' (length=7)
6 => string 'LASTNAME B' (length=10)
7 => string '2011-12-08 16:15:43' (length=19)
8 => string '3' (length=1)
9 => string 'FIRSTNAME C' (length=7)
10 => string 'LASTNAME C' (length=10)
11 => string '2011-12-08 16:15:48' (length=19)
12 => string '4' (length=1)
13 => string 'FIRSTNAME D' (length=7)
14 => string 'LASTNAME D' (length=10)
15 => string '2011-12-08 16:15:55' (length=19)
16 => string '6' (length=1)
17 => string 'FIRSTNAME E' (length=7)
18 => string 'LASTNAME E' (length=10)
19 => string '2011-12-08 16:16:08' (length=19)
I want the final output to look like
array[0]= 1, FIRSTNAME A, LASTNAME A, DATE
array[1]= 2, FIRSTNAME B, LASTNAME B, DATE
array[2]= 3, FIRSTNAME C, LASTNAME C, DATE
array[3]= 4, FIRSTNAME D, LASTNAME D, DATE
array[4]= 4, FIRSTNAME E, LASTNAME E, DATE
This should work
$clean = array_chunk($dirty, 4);
more about array_chunk
Wow, I'm going to take a stab at this. Not completely sure what your asking, but hopefully this will get us in some direction:
$cleanArray = array_chunk($dirtyArray,4);
foreach($cleanArray as $value) {
$finalArray[] = implode(", ",$value);
}
print_r($finalArray);

Get text that is within brackets with single or double quotes

I try to found in my all PHP files the strings inside the i18n functions. Here is an example:
$string = '__("String 2"); __("String 3", __("String 4"));' . "__('String 5'); __('String 6', __('String 7'));";
var_dump(preg_match_all('#__\((\'|")([^\'"]+)(\'|")\)#', $string, $match));
var_dump($match);
I wanna get this result:
array
0 => array
0 => string 'String 2' (length=8)
1 => string 'String 3' (length=8)
2 => string 'String 4' (length=8)
3 => string 'String 5' (length=8)
4 => string 'String 6' (length=8)
4 => string 'String 7' (length=8)
But unfortunately I get this result
array
0 => array
0 => string '__("esto es una prueba")' (length=24)
1 => string '__("esto es una prueba 2")' (length=26)
2 => string '__("prueba 4")' (length=14)
3 => string '__('caca')' (length=10)
4 => string '__('asdsnasdad')' (length=16)
1 => array
0 => string '"' (length=1)
1 => string '"' (length=1)
2 => string '"' (length=1)
3 => string ''' (length=1)
4 => string ''' (length=1)
2 => array
0 => string 'esto es una prueba' (length=18)
1 => string 'esto es una prueba 2' (length=20)
2 => string 'prueba 4' (length=8)
3 => string 'caca' (length=4)
4 => string 'asdsnasdad' (length=10)
3 => array
0 => string '"' (length=1)
1 => string '"' (length=1)
2 => string '"' (length=1)
3 => string ''' (length=1)
4 => string ''' (length=1)
Thanks in advance.
preg_match_all('/(?<=\(["\']).*?(?=[\'"])/', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
Simple.
Note that I am using ( as an entry point to the match. If you have more exotic input you should provide it.
Don't capture the quotes.
preg_match_all('#__\([\'"]([^\'"]+)[\'"]\)#', $string, $match);
Also take a look at the flags parameter for preg_match_all() for different output formats.

Categories