PHP range for hebrew alphabets

PHP range for hebrew alphabets - php

PHP has a function range('a','z') which prints the English alphabet a, b, c, d, etc.
Is there a similar function for hebrew alphabets?

You can do something like this:
function utfOrd($c) {
return intval(array_pop(unpack('H*', $c)),16);
}
function utfChr($c) {
return pack('H*', base_convert("$c", 10, 16));
}
var_dump(array_map('utfChr', range(utfOrd('א'), utfOrd('ת'))));
Prints:
array
0 => string 'א' (length=2)
1 => string 'ב' (length=2)
2 => string 'ג' (length=2)
3 => string 'ד' (length=2)
4 => string 'ה' (length=2)
5 => string 'ו' (length=2)
6 => string 'ז' (length=2)
7 => string 'ח' (length=2)
8 => string 'ט' (length=2)
9 => string 'י' (length=2)
10 => string 'ך' (length=2)
11 => string 'כ' (length=2)
12 => string 'ל' (length=2)
13 => string 'ם' (length=2)
14 => string 'מ' (length=2)
15 => string 'ן' (length=2)
16 => string 'נ' (length=2)
17 => string 'ס' (length=2)
18 => string 'ע' (length=2)
19 => string 'ף' (length=2)
20 => string 'פ' (length=2)
21 => string 'ץ' (length=2)
22 => string 'צ' (length=2)
23 => string 'ק' (length=2)
24 => string 'ר' (length=2)
25 => string 'ש' (length=2)
26 => string 'ת' (length=2)
If you need some more characters, you can use this to create your hardcoded array or merge few ranges.

Range can work with the standard western alphabet because the characters A thru Z are consecutive values in the ASCII (and UTF-8) character set.
Hebrew characters are not ASCII chars (see this list) but you could set an initial range of the UTF-8 numeric values and then just array_map that to characters.

Related

PHP str_split on string with decoded html_entity

If I run this code:
<?php
$string = 'My string ‘to parse’';
$string_decoded = html_entity_decode($string, ENT_QUOTES, 'utf-8');
$string_array = str_split($string_decoded);
var_dump($string_array);
?>
I get this result:
array (size=28)
0 => string 'M' (length=1)
1 => string 'y' (length=1)
2 => string ' ' (length=1)
3 => string 's' (length=1)
4 => string 't' (length=1)
5 => string 'r' (length=1)
6 => string 'i' (length=1)
7 => string 'n' (length=1)
8 => string 'g' (length=1)
9 => string ' ' (length=1)
10 => string '�' (length=1)
11 => string '�' (length=1)
12 => string '�' (length=1)
13 => string 't' (length=1)
14 => string 'o' (length=1)
15 => string ' ' (length=1)
16 => string 'p' (length=1)
17 => string 'a' (length=1)
18 => string 'r' (length=1)
19 => string 's' (length=1)
20 => string 'e' (length=1)
21 => string '�' (length=1)
22 => string '�' (length=1)
23 => string '�' (length=1)
As you can see, instead of the decoded single quotes (left/right), I'm getting these three characters for each quote...
I noticed that this happens with some entities, but not others. A few that present this issue are ‘ ” $copy;. Some that don't present the same problem are & $gt;.
I tried different charsets but couldn't find one that would work for all.
What am I doing wrong? Is there a way to make it work for all entities? Or at least all the "common" ones?
Thanks.

This should do well:
function mb_str_split($string) {
return preg_split('/(?<!^)(?!$)/u', $string );
}
$string = 'My string ‘to parse’';
$string = utf8_encode($string);
$string_decoded = html_entity_decode($string, ENT_QUOTES, 'utf-8');
$string_array = mb_str_split($string_decoded);
var_dump($string_array);
As mentioned in comments: you need to split the string with mb_split or by regex.
Proof: https://3v4l.org/3FRmG

Utf8 Sort Array

i have some problems with sorting an array.
List
0 => string 'Australien' (length=10)
1 => string 'Belgien' (length=7)
2 => string 'Botswana' (length=8)
3 => string 'Brasilien' (length=9)
4 => string 'Bulgarien' (length=9)
5 => string 'Burma' (length=5)
6 => string 'China' (length=5)
7 => string 'Costa Rica' (length=10)
73 => string 'Ägypten' (length=8)
But Ägypten should be after Australien.
I already tried with the Collator class but our client wont install the extension.

You can use setlocale along with first parameter LC_COLLATE and second locale with en_US.utf8 and simply sort using usort along with strcoll try as
setlocale(LC_COLLATE, 'en_US.utf8');
$array = array('Australien','Belgien','Botswana','Brasilien','Bulgarien','Burma','China','Costa Rica','Ägypten');
usort($array, 'strcoll');
print_r($array);
Demo

What array function should I use for creating an index?

Hello guys I am trying to create an index of all words on html page that my crawler parses.
At this moment I have managed to breakdown the html page into an array of words and I have filtered out all the stop words.
At this stage I have a few problems.
The array of words from the parsed html page have words that are repeated, I like that because I still have to record how many times a word appeared in the page.
The array looks like this.
$wordsFromHTML =
array (size=119)
0 => string 'web' (length=3)
1 => string 'giants' (length=6)
2 => string 'vryheid' (length=7)
3 => string 'news' (length=4)
4 => string 'access' (length=6)
5 => string 'mails' (length=5)
6 => string 'mobile' (length=6)
7 => string 'february' (length=8)
8 => string 'access' (length=6)
9 => string 'mails' (length=5)
10 => string 'web' (length=3)
11 => string 'february' (length=8)
12 => string 'access' (length=6)
13 => string 'mails' (length=5)
14 => string 'desktop' (length=7)
15 => string 'february' (length=8)
16 => string 'hosting' (length=7)
17 => string 'web' (length=3)
18 => string 'giants' (length=6)
19 => string 'vryheid' (length=7)
20 => string 'february' (length=8)
22 => string 'us' (length=2)
Now I want to save all the words from the $wordsFromHTML to the $indesArray which is my final index.
It should look like this.
$indexArray = array('web'=>array('url'=>array(0,10,17)))
The problem is how to keep incrementing the position ($wordsFromHTML keys) for each word that was repeated from the $wordsFromHTML array in the final index array.
The index array should only have unique words and if another word that already exists try to come in, we use the already existing word which has the same URL and increment its position.
Hope you understand my question.

How to group array but not create new object?

I have an array that looks like this:
array (size=21)
0 => string '2' (length=1)
1 => string '' (length=0)
2 => string '20' (length=2)
3 => string '19' (length=2)
4 => string '14' (length=2)
5 => string '13' (length=2)
6 => string '' (length=0)
7 => null
8 => null
9 => string '20' (length=2)
10 => null
11 => string '10' (length=2)
12 => string '' (length=0)
13 => null
14 => string '13' (length=2)
15 => null
16 => string '' (length=0)
17 => null
18 => null
19 => string '' (length=0)
20 => string '20' (length=2)
And I would like to create a new array from this array by grouping rows with the same string. e.g.
2 => string '20' (length=2) with 20 => string '20' (length=2) and with 9 => string '20' (length=2)
and
5 => string '13' (length=2) with 5 => string '13' (length=2)
etc.
and order the new created array rows based on how many times the string occures there.
Order need to be DESC from the most occurrences to the last like a classic top something chart (The most present strings are first and the least are low)
So, the modified array will look like this:
array (size=21)
0 => string '20' (length=2)
1 => string '13' (length=2)
...
I also need somehow to handle null results e.g. 17 => null to be not incorporated at all in the final array modified result.

This should do the trick:
// Filter the "null results" first
$myarray = array_filter($myarray, create_function('$arg', '
return !is_null($arg);
'));
$occurrences = array_count_values($myarray);
// EDIT: arsort preserves the key => value correlation
arsort($occurrences, SORT_NUMERIC);
var_dump(array_keys($occurrences));

Try this.
$result = array_count_values($a);
arsort($result);
$result = array_keys($result);

Get text that is within brackets with single or double quotes

I try to found in my all PHP files the strings inside the i18n functions. Here is an example:
$string = '__("String 2"); __("String 3", __("String 4"));' . "__('String 5'); __('String 6', __('String 7'));";
var_dump(preg_match_all('#__\((\'|")([^\'"]+)(\'|")\)#', $string, $match));
var_dump($match);
I wanna get this result:
array
0 => array
0 => string 'String 2' (length=8)
1 => string 'String 3' (length=8)
2 => string 'String 4' (length=8)
3 => string 'String 5' (length=8)
4 => string 'String 6' (length=8)
4 => string 'String 7' (length=8)
But unfortunately I get this result
array
0 => array
0 => string '__("esto es una prueba")' (length=24)
1 => string '__("esto es una prueba 2")' (length=26)
2 => string '__("prueba 4")' (length=14)
3 => string '__('caca')' (length=10)
4 => string '__('asdsnasdad')' (length=16)
1 => array
0 => string '"' (length=1)
1 => string '"' (length=1)
2 => string '"' (length=1)
3 => string ''' (length=1)
4 => string ''' (length=1)
2 => array
0 => string 'esto es una prueba' (length=18)
1 => string 'esto es una prueba 2' (length=20)
2 => string 'prueba 4' (length=8)
3 => string 'caca' (length=4)
4 => string 'asdsnasdad' (length=10)
3 => array
0 => string '"' (length=1)
1 => string '"' (length=1)
2 => string '"' (length=1)
3 => string ''' (length=1)
4 => string ''' (length=1)
Thanks in advance.

preg_match_all('/(?<=\(["\']).*?(?=[\'"])/', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
Simple.
Note that I am using ( as an entry point to the match. If you have more exotic input you should provide it.

Don't capture the quotes.
preg_match_all('#__\([\'"]([^\'"]+)[\'"]\)#', $string, $match);
Also take a look at the flags parameter for preg_match_all() for different output formats.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP range for hebrew alphabets - php

PHP has a function range('a','z') which prints the English alphabet a, b, c, d, etc. Is there a similar function for hebrew alphabets?

Related

PHP str_split on string with decoded html_entity

Utf8 Sort Array

What array function should I use for creating an index?

How to group array but not create new object?

Get text that is within brackets with single or double quotes

Categories

Resources