I want to remove duplicates, if these duplicates got a length of more then 4 characters.
How can we achieve that? My current code also remove the duplicate - values.
CODE:
$seoproducttitle = 'HP Chromebook Chromebook 11 G5 EE - 11.6 inch - Intel® Celeron® - 4LT18EA#ABH';
$productnamestring = $seoproducttitle;
$findseo = array('/\h+inch (?:(i[357])-\w+|\h+\w+)?/', '/(\w+)#\w+/');
$replaceseo = array('" $1', '$1');
$productnamingseo = preg_replace($findseo, $replaceseo, $productnamestring);
echo implode(' ', array_unique(explode(' ', $productnamingseo)));
This outputs: HP Chromebook 11 G5 EE - 11.6" Intel® Celeron® 4LT18EA
It should output: HP Chromebook 11 G5 EE - 11.6" - Intel® Celeron® - 4LT18EA
Or: Apple MacBook Air MacBook Air - 13.3 inch - Intel Core i5-8e - MRE82N/A
Should be: Apple MacBook Air - 13.3 inch - Intel Core i5-8e - MRE82N/A
EXAMPLE: http://sandbox.onlinephpfunctions.com/code/5bcaaf47ca97d6dee359802f2d71c2d889c0d091
Update
Based on comments from OP, the required regex is
/(^| )(.{4,}) (.*)\2/
This looks for a group of 4 or more characters preceded by either a space or the start of the line and followed by a space, some number of other characters and then the group repeated again. The regex is replaced by $1$2 $3 which effectively removes the duplicate string. A couple of examples:
$seoproducttitle = 'Apple MacBook Air MacBook Air - 13.3 inch - Intel Core i5-8e - MRE82N/A';
echo preg_replace('/(^| )(.{4,}) (.*)\2/', "$1$2 $3", $seoproducttitle) . "\n";
$seoproducttitle = 'HP Chromebook 11 G5 EE Chromebook - 11.6 inch - Intel® Intel® Celeron® - 4LT18EA#ABH 4LT18EA#ABH';
echo preg_replace('/(^| )(.{4,}) (.*)\2/', "$1$2 $3", $seoproducttitle) . "\n";
Output:
Apple MacBook Air - 13.3 inch - Intel Core i5-8e - MRE82N/A Array
HP Chromebook 11 G5 EE - 11.6 inch - Intel® Celeron® - 4LT18EA#ABH
Updated demo on 3v4l.org
Original Answer
You could use this regex:
\b([^ ]{4,})( |$)(.*)\1
It looks for a group of 4 or more non-blank characters, followed by a space or end-of-string, followed by some number of other characters and then the first group repeated. The regex is replaced by $1$3 which effectively removes the duplicate string. e.g.
$seoproducttitle = 'HP Chromebook 11 G5 EE Chromebook - 11.6 inch - Intel® Intel® Celeron® - 4LT18EA#ABH 4LT18EA#ABH';
echo preg_replace('/\b([^ ]{4,})( |$)(.*)\1/', "$1$3", $seoproducttitle);
Output:
HP Chromebook11 G5 EE - 11.6 inch - Intel® Celeron® - 4LT18EA#ABH
Demo on 3v4l.org
Computers only do what we tell them, so you first need to explain the process to yourself in plain language. Then translate that to code. Then if you're having trouble doing that you've at least got a proper description of the problem to post on StackOverflow .
$words = explode(' ', $productnamingseo);
// start with an empty list of words we've seen
$output = [];
// for every word
foreach($words as $word) {
// if it's longer than 4 chars and we've already seen it
if( mb_strlen($word) >= 4 && in_array($word, $output) ) {
// debug: show omitted words
// $output[] = str_repeat('X', mb_strlen($word));
// skip it
continue;
}
// otherwise, add it to the list of words we've already seen
$output[] = $word;
}
var_dump(
$productnamingseo,
implode(' ', $output)
);
Related
I'm working on unpacking a binary data for the first time, I'm doing it pretty fine but I have a part where I need to unpack two bytes with (LSB = version 0.01) (that's a hint written by someone I can't reach to), Could someone please explain that to me? and how I should do it in PHP, I have googled that first but I couldn't find anything useful out there.
<info>: 10 Bytes
offset 0 / 1 byte: SV = payload structure subversion (0x01 -> version 1.00)
offset 1 / 2 bytes: HW = HW version (LSB = version 0.01)
offset 3 / 2 bytes: FW = FW version (LSB = version 0.01)
offset 5 / 1 byte: DS = device status
offset 6 / 4 bytes: SN = 32-bit device serial number
// My solution
$sensor->hw_version = unpack('v', substr($binary_data, 5, 2))[1]; // this giving me numbers like 110
Input is a base64 string, otykgAFuAGUAAEwBQAMfCqMI6g3zA+UDBQR8AXEBiQEyAiQCPQKh/nb+SwBKAAA=
expected output something similar to this:
CR = 0xEA69
MN = 0x19
SI = 0x80
SV = 0x01
HW = 1.10 (0x006E)
FW = 1.00 (0x0064)
DS = 0x05 (bit0 = 1: Request for response with settings, bit2-bit1 = 0b10: Watchdog reset occured before this transmission)
SN = 03401234
Sensor data:
S1 <battery> = 2450 mV
S2 <solar> = 1825 mV
S3 <precipitation> = 360 (accumulative value in [0.1 mm])
S4 <air_temperature> average = 15.62 degree C.
S5 <air_temperature> min = 14.50 degree C.
S6 <air_temperature> max = 16.02 degree C.
S7 <relative_humidity> average = 72.3 %
S8 <relative_humidity> min = 71.0 %
S9 <relative_humidity> max = 73.1 %
S10 <deltaT> average = 2.55 degree C.
S11 <deltaT> min = 2.12 degree C.
S12 <deltaT> max = 2.89 degree C.
S13 <dewPoint> average = 0.59 degree C.
S14 <dewPoint> min = -1.21 degree C.
S15 <vpd> average = 1.17 kPa
S16 <vpd> min = 0.95 kPa
S17 <leaf_wetness> = 15 min.
What else should I do?
0x6E is 110 in decimal, but 0x006E depends on if you unpack it as big-endian, or little-endian. In this case you want big-endian, or the n flag to unpack.
$enc_version = "\x00\x6e";
$unpacked = unpack('n', $enc_version)[1];
$dec_version = sprintf("%.2f", $unpacked/100);
var_dump(
bin2hex($enc_version),
$unpacked,
$dec_version
);
Output:
string(4) "006e"
int(110)
string(4) "1.10"
If Decimal separator ( point(.) or comma(,) and thousand separator (comma(,) or point(.)) are present the incoming value should be handled by pattern.
My current code is
if (!preg_match('/^((?:(?:\-?[\d' . $thousandSeparator . ']+(?:' . $decimalSeparator . '\d+)?)|\s*))\s*(.*)$/', $value, $matches))
{
throw_error;
}
CASE 1 -
$decimalSeparator = '.';
$thousandSeparator = ',';
Allowed cases -
45,789.45
45,789.45 cm
789 cm
789.45 cm
1,789 cm
78,789,756.45
Not allowed cases -
45.789,78
45.789,78 cm
78.7.78,78
7.8,5
7.8 cm
Case 2 -
$decimalSeparator = ',';
$thousandSeparator = '.';
Allowed cases -
45.789,78
45.789,78 cm
789,45
1.789 cm
789
Not allowed cases -
45,789.45
45,789.45 cm
789 cm
789.45 cm
1,789 cm
78,789,756.45
78,78,78 cm
Note - 'cm' is centimeter which is variable, there can be inch, mm, km, etc. The unit can be present or not, but if it is there, it need to be handled. Now i have put the unit randomly, please not considered unit as it is exact way.
Thanks. :)
You can build your pattern like this:
$units = ['[mck]m', 'inch']; // complete it
$pattern = sprintf('~^-?\d{1,3}(?:[%s]\d{3})*(?:[%s]\d\d)?(?: (?:%s))?$~', $thousandSep, $decimalSep, implode('|', $units));
Here is a problem that I wld like to formulate:
The painter intends to frame his square canvases of different sizes in centimeters:
25cm x 35cm -- 20 pcs,
50 x 30 -- 30 pcs,
90 x 50 -- 40 pcs,
110 x 60 -- 25 pcs,
The painter will purchase wooden stretcher bars of 200cm and cut them accordingly.
Condition is "each frame edge should be single continuous bar. No gluing".
Unlimited wooden stretcher bars available in length 200 cm.
how many bars of (200 cm) the Painter should buy?
How to calculate the optimized number of bars, with least wastage of bars?
Is this problem related to optimization (mathematical programming) or AI?
PHP, Perl, vbscript codes welcome.
==============
For clarification purpose, here are the exact lengths to be produced from 200cm bars.
LENGTH PIECES TOTAL LENGTH
110 cm 50 pcs 5500 cm
90 cm 80 pcs 7200 cm
60 cm 50 pcs 3000 cm
50 cm 140 pcs 7000 cm
35 cm 40 pcs 1400 cm
30 cm 60 pcs 1800 cm
25 cm 40 pcs 1000 cm
===========================================
ALL TOTAL: 26900 cm
it is equal to 134.5 bars, if we were allowed to glue small remaining pieces.
It will be practical to guide the painter what lengths should be cut from each bar.
Otherwise he will not know what to do with the bars supplied.
You'll need width of stretcher bars to calculate length for angles (spending additional 2*$stretcher_width for each side of canavas)
use strict;
use warnings;
my $stretcher_length = 200;
my $stretcher_width = 0;
my $wasted_per_side = 2*$stretcher_width;
my #sc = (
{w=> 25, h=> 35, pcs=> 20},
{w=> 50, h=> 30, pcs=> 30},
{w=> 90, h=> 50, pcs=> 40},
{w=> 110, h=> 60, pcs=> 25},
);
# all possible bars needed from longest to shortest
my #all = sort { $b <=> $a } map {
(
($_->{w}+$wasted_per_side) x2, ($_->{h}+$wasted_per_side) x2
)x $_->{pcs};
}
#sc;
# lets cut from 200cm bars
my #rest;
for my $len (#all) {
my $cut_from;
# do we already have bar which can be used?
for my $len_have (#rest) {
# yes, we have
if ($len_have >= $len) { $cut_from = \$len_have; last; }
}
# no, we need another 200cm bar
if (!$cut_from) {
print "Taking new $stretcher_length cm bar\n";
push #rest, $stretcher_length;
$cut_from = \$rest[-1];
}
# cut it
print "Now you have at least one bar $$cut_from long and cut away $len\n";
$$cut_from -= $len;
# keep #rest bars sorted from shortest to longest
#rest = sort { $a <=> $b } #rest;
}
print scalar #rest;
# print "#rest\n"; # left overs
it is actually cutting stock problem
Wikipedia article
There is a C implementation
CPSOL
Solves above problem with 135 sticks.
Unfortunately, failed to find a Perl implementation
I have this code to replace every number (and point) and replace it by <b>and the text</b>
<?
function reem2($cadena) {
$buscarRegex = array('/^([0-9]{1}|[.])$/i');
$reemplazo = array('<b>$i</b>');
$mag = preg_replace($buscarRegex, $reemplazo, $cadena);
return $cadena;
}
$string = "1. Krewella - Can't Control Myself
2. Kdrew - Circles
3. Korn Feat. Skrillex & Kill The Noise - Narcissistic Cannibal
4. Netsky - Love Has Gone
5. Example - Midnight Run (Flux Pavilion Remix)
6. Madeon - Finale (Radio Version)
7. Feed Me Vs. Knife Party Vs. Skrillex - My Pink Reptile Party (Maluu's Slice'n'diced Mashup)
8. Krewella & Pegboard Nerds - This Is Not The End
9. Skrillex - Bangarang
10. The Prototypes - Suffocate
11. Ayah Marar - Mind Controller (Cutline Remix)
12. Skrillex Feat. Krewella - Breathe (Vocal Edit)
13. Utah Saints Vs. Drumsound & Bassline Smith - What Can You Do For Me (Tantrum Desire Remix)
14. Nero - Promises (Skrillex & Nero Remix)
15. 20 Florence & The Machine - Cosmic Love (Seven Lions Remix)";
echo reem2(nl2br($string));
?>
But it doesn't work, It doesn't change anything:
The output in HTML would be:
1. Krewella - Can't Control Myself
2. Kdrew - Circles
3. Korn Feat. Skrillex & Kill The Noise - Narcissistic Cannibal
4. Netsky - Love Has Gone
5. Example - Midnight Run (Flux Pavilion Remix)
6. Madeon - Finale (Radio Version)
7. Feed Me Vs. Knife Party Vs. Skrillex - My Pink Reptile Party (Maluu's Slice'n'diced Mashup)
8. Krewella & Pegboard Nerds - This Is Not The End
9. Skrillex - Bangarang
10. The Prototypes - Suffocate
11. Ayah Marar - Mind Controller (Cutline Remix)
12. Skrillex Feat. Krewella - Breathe (Vocal Edit)
13. Utah Saints Vs. Drumsound & Bassline Smith - What Can You Do For Me (Tantrum Desire Remix)
14. Nero - Promises (Skrillex & Nero Remix)
15. 20 Florence & The Machine - Cosmic Love (Seven Lions Remix)
What can I do?
Your regex is broken:
/^([0-9]{1}|[.])$/i
^-- start of line
^--- end of line
you are allowing only for one SINGLE character on a line by itself, so the regex can never match anything.
You probably want something more like this:
/^([\d]+)\./
which will match any number of digits at the start of the line, which are followed by a single ..
You may use the following code:
function reem2($cadena) {
$buscarRegex = array('/^(\d+\.)/mi'); // This means match any digit(s) followed by a dot at the beginning of each line. Note the m modifier
$reemplazo = array('<b>$1</b>'); // replace should be with group 1, not some vague $i
$mag = preg_replace($buscarRegex, $reemplazo, $cadena);
return $mag; // return value: fixed
}
return $cadena;
is your problem, Your doing the replace and then throwing the result away and returning the input
return $mag;
is probably what you meant
in fact your regex is also wrong
function reem2($cadena) {
$buscarRegex = array('/^([0-9]{1,2}\.)(.*)$/m');
$reemplazo = array('<b>\1</b>\2');
$mag = preg_replace($buscarRegex, $reemplazo, $cadena);
return $mag;
}
seems to be what you want.
So from what I see, your regex is incorrect and you're returning the incorrect variable in function reem2, so try replacing your function with something like this
function reem2($cadena) {
return preg_replace("/([0-9]+\.)/", "<b>$1</b>", $cadena);
}
function reem2($cadena) {
$buscarRegex = array('/^([0-9]+\.)/m'); // changed modifier to multiline
$reemplazo = array('<b>$1</b>'); // changed replacement to a capture offset
return preg_replace($buscarRegex, $reemplazo, $cadena);
}
My testcase as follows:
echo crypt('string', '_....salt');//error
echo crypt('string', '_A...salt');//fast
echo crypt('string', '_AAAAsalt');//slow
Explanation as stated at http://www.php.net/manual/en/function.crypt.php:
CRYPT_EXT_DES - Extended DES-based hash. The "salt" is a 9-character
string consisting of an underscore followed by 4 bytes of iteration
count and 4 bytes of salt. These are encoded as printable characters,
6 bits per character, least significant character first. The values 0
to 63 are encoded as "./0-9A-Za-z". Using invalid characters in the
salt will cause crypt() to fail.
A dot is a printable character so why does it return an error? And which "order" applies on the used characters resulting "AAAA" more iterations than "A..."?
It says all in the quoted paragraph:
- least significant character first
- The values 0 to 63 are encoded as "./0-9A-Za-z"
So in your example "_....salt" would mean 0 rounds which obviously can't work.
and "_A...salt" is less than "_AAAAsalt" considering the least significant character comes first.
"_...Asalt" would also be more than "_A...salt"
This question is a bit old, however i found this when trying to wrap my head around how to create a hashing class for internal use here, and i came up with this little function which will base64 encode an integer with the appropriate characters/significance to be used as the 4 character 'iteration count'. Possible values are from 1 to 16,777,215
private function base64_int_encode($num){
$alphabet_raw = "./0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
$alphabet = str_split($alphabet_raw);
$arr = array();
$base = sizeof($alphabet);
while($num){
$rem = $num % $base;
$num = (int)($num / $base);
$arr[]=$alphabet[$rem];
}
$arr = array_reverse($arr);
$string = implode($arr);
return str_pad($string, 4, '.', STR_PAD_LEFT);
}
Hope it helps someone!
The code of Klathmon is nice but has some mistakes:
First - alphabet
It is:
./0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Should be:
./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Second - order of characters/digits
It generates for example: ...z
But it should generate: z...
The improved code:
function base64_int_encode($num) {
$alphabet_raw='./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';
$alphabet=str_split($alphabet_raw);
$arr=array();
$base=sizeof($alphabet);
while($num) {
$rem=$num % $base;
$num=(int)($num / $base);
$arr[]=$alphabet[$rem];
}
$string=implode($arr);
return str_pad($string, 4, '.', STR_PAD_RIGHT);
}
A number system used in Extended DES:
.... - 0 (Extended DES error)
/... - 1
0... - 2
1... - 3
2... - 4
3... - 5
4... - 6
5... - 7
6... - 8
7... - 9
8... - 10
z... - 63
./.. - 64
//.. - 65
0/.. - 66
1/.. - 67
Y/.. - 100
61.. - 200
g2.. - 300
E4.. - 400
o5.. - 500
M7.. - 600
w8.. - 700
UA.. - 800
2C.. - 900
cD.. - 1000
zz.. - 4095
../. - 4096
/./. - 4097
0./. - 4098
1./. - 4099
xzzz - 16 777 213
yzzz - 16 777 214
zzzz - 16 777 215
And in connection with salt:
_/...salt - 1
_0...salt - 2
_1...salt - 3
_2...salt - 4
_3...salt - 5
_4...salt - 6
_5...salt - 7
_6...salt - 8
_7...salt - 9
_8...salt - 10
_z...salt - 63
_./..salt - 64
_//..salt - 65
_0/..salt - 66
_1/..salt - 67
_Y/..salt - 100
_61..salt - 200
_g2..salt - 300
_E4..salt - 400
_o5..salt - 500
_M7..salt - 600
_w8..salt - 700
_UA..salt - 800
_2C..salt - 900
_cD..salt - 1000
_zz..salt - 4095
_../.salt - 4096
_/./.salt - 4097
_0./.salt - 4098
_1./.salt - 4099
_xzzzsalt - 16 777 213
_yzzzsalt - 16 777 214
_zzzzsalt - 16 777 215