PHP Regex not matching polish characters

PHP Regex not matching polish characters - php

This code doesn't return any matches when I type "szukaj/zwierzęta" in the url path, but when I type "szukaj/zwierzeta" it works.
<?php
$url = "http://{$_SERVER['HTTP_HOST']}{$_SERVER['REQUEST_URI']}";
$rules = array(
'film' => "/film/(?'film_slug'[^/]+)-(?'film_id'\d+)",
'szukaj' => "/szukaj/(?'query'[\w\-]+)",
);
foreach ( $rules as $action => $rule ) {
if ( preg_match( '~^'.$rule.'$~i', $uri, $params ) ) {
switch($action)
{
case 'szukaj': (doesn't work with ęąćźż, works with others)
}
}
}
I also tried with 'szukaj' => "/szukaj/(?'query'[\pL|\pN-]+)",
but it didn't work.

You can use [a-zA-ZąćęłńóśźżĄĆĘŁŃÓŚŹŻ-] to mach any a-zA-Z and any polish special char.
Here is the usage inside your code:
<?php
$url = "http://{$_SERVER['HTTP_HOST']}{$_SERVER['REQUEST_URI']}";
$rules = array(
'film' => "/film/(?'film_slug'[^/]+)-(?'film_id'\d+)",
'szukaj' => "/szukaj/(?'query'[+a-zA-ZąćęłńóśźżĄĆĘŁŃÓŚŹŻ-]+)",
);
foreach ( $rules as $action => $rule ) {
if ( preg_match( '~^'.$rule.'$~i', $uri, $params ) ) {
switch($action)
{
case 'szukaj': (doesn't work with ęąćźż, works with others)
}
}
}
And regex101 example:
https://regex101.com/r/F0uiDE/1

Related

Ignore parts of URL

I'm working on a simple script to scrape the channel ID of a YouTube URL.
For example, to get the channel ID on this URL:
$url = 'https://youtube.com/channel/UCBLAoqCQyz6a0OvwXWzKZag';
I use regex:
preg_match( '/\/channel\/(([^\/])+?)$/', $url, $matches );
Works fine. But if the URL has any extra parameters or anything else after the channel ID, it doesn't work. Example:
https://youtube.com/channel/UCBLAoqCQyz6a0OvwXWzKZag?PARAMETER=HELLO
https://youtube.com/channel/UCBLAoqCQyz6a0OvwXWzKZag/RANDOMFOLDER
etc...
My question is, how can I adjust my regex so it works with those URLs? We don't want to match with the random parameters etc
Feel free to test my ideone code.

You can fix the regexps in the following way:
$preg_entities = [
'channel_id' => '\/channel\/([^\/?#]+)', //match YouTube channel ID from url
'user' => '\/user\/([^\/?#]+)', //match YouTube user from url
];
See the PHP demo.
With [^\/?#]+ patterns, the regex won't go through the query string in an URL, and you will get clear values in the output.
Full code snippet:
function getYouTubeXMLUrl( $url) {
$xml_youtube_url_base = 'h'.'ttps://youtube.com/feeds/videos.xml';
$preg_entities = [
'channel_id' => '\/channel\/([^\/?#]+)', //match YouTube channel ID from url
'user' => '\/user\/([^\/?#]+)', //match YouTube user from url
];
foreach ( $preg_entities as $key => $preg_entity ) {
if ( preg_match( '/' . $preg_entity . '/', $url, $matches ) ) {
if ( isset( $matches[1] ) ) {
return [
'rss' => $xml_youtube_url_base . '?' . $key . '=' . $matches[1],
'id' => $matches[1],
'type' => $key,
];
}
}
}
}
Test:
$url = 'https://youtube.com/channel/UCBLAoqCQyz6a0OvwXWzKZag?PARAMETER=HELLO';
print_r(getYouTubeXMLUrl($url));
// => Array( [rss] => https://youtube.com/feeds/videos.xml?channel_id=UCBLAoqCQyz6a0OvwXWzKZag [id] => UCBLAoqCQyz6a0OvwXWzKZag [type] => channel_id )
$url = 'https://youtube.com/channel/UCBLAoqCQyz6a0OvwXWzKZag/RANDOMFOLDER';
print_r(getYouTubeXMLUrl($url));
// => Array( [rss] => https://youtube.com/feeds/videos.xml?channel_id=UCBLAoqCQyz6a0OvwXWzKZag [id] => UCBLAoqCQyz6a0OvwXWzKZag [type] => channel_id )

Create shortcode with parameter in PHP Joomla

I've created a simple shortcode plugin on Joomla.
Actually I am trying to integrate Cleeng Video with Joomla. And will connect it's users in the future ( I hope ).
I've stack on creating shortcode's parameter. I don't know how to parse it's parameter and value.
My Shortcode is here (no parameter)
{cleengvideo}<iframe class="wistia_embed" src="http://fast.wistia.net/embed/iframe/5r8r9ib6di" name="wistia_embed" width="640" height="360" frameborder="0" scrolling="no" allowfullscreen=""></iframe>{/cleengvideo}
My code is here
public function onContentPrepare($content, $article, $params, $limit) {
preg_match_all('/{cleengvideo}(.*?){\/cleengvideo}/is', $article->text, $matches);
$i = 0;
foreach ($matches[0] as $match) {
$videoCode = $matches[1][$i];
$article->text = str_replace($match, $videoCode, $article->text);
}
I want to set height, width and 5r8r9ib6di this code from shortcode at least.
Please can anyone help me with adding and parsing it's parameter

To get a parameter, you can simply use the following code:
$params->get('param_name', 'default_value');
So for example, in your XML file, if you had a field like so:
<field name="width" type="text" label="Width" default="60px" />
you would call the parameter like so:
$params->get('width', '60px');
Note that you don't have to add the default value as the second string, however I always find it good practice.
Hope this helps

I think I could found it's solution.
It's here https://github.com/Cleeng/cleeng-wp-plugin/blob/master/php/classes/Frontend.php
Code is
$expr = '/\[cleeng_content(.*?[^\\\])\](.*?[^\\\])\[\/cleeng_content\]/is';
preg_match_all( $expr, $post->post_content, $m );
foreach ( $m[0] as $key => $content ) {
$paramLine = $m[1][$key];
$expr = '/(\w+)\s*=\s*(?:\"|")(.*?)(?<!\\\)(?:\"|")/si';
preg_match_all( $expr, $paramLine, $mm );
if ( ! isset( $mm[0] ) || ! count( $mm[0] ) ) {
continue;
}
$params = array( );
foreach ( $mm[1] as $key => $paramName ) {
$params[$paramName] = $mm[2][$key];
}
if ( ! isset( $params['id'] ) ) {
continue;
}
$content = array(
'contentId' => $params['id'],
'shortDescription' => #$params['description'],
'price' => #$params['price'],
'itemType' => 'article',
'purchased' => false,
'shortUrl' => '',
'referred' => false,
'referralProgramEnabled' => false,
'referralRate' => 0,
'rated' => false,
'publisherId' => '000000000',
'publisherName' => '',
'averageRating' => 4,
'canVote' => false,
'currencySymbol' => '',
'sync' => false
);
if ( isset( $params['referral'] ) ) {
$content['referralProgramEnabled'] = true;
$content['referralRate'] = $params['referral'];
}
if ( isset( $params['ls'] ) && isset( $params['le'] ) ) {
$content['hasLayerDates'] = true;
$content['layerStartDate'] = $params['ls'];
$content['layerEndDate'] = $params['le'];
}
$this->cleeng_content[$params['id']] = $content;
}

Hope this helps someone searching for shortcode parameters, for parameters in short code we can use preg_match_all like that
preg_match_all('/{cleengvideo(.*?)}(.*?){\/cleengvideo}/is', $article->text, $matches);
This will give a array with 3 array elements, the second array have the parameters which you can maupulate with codes.
Hope this helps.

Magento: improving search engine (inflections, irrelevant words removal, etc.)

I'm interested in knowing if I can detect inflections (e.g. dogs/dog), remove non-important words ("made in the usa" -> "in" and "the" are not important), etc. in the search string entered by the user for the Magento search engine without hard-coding such many scenarios in one big PHP code block. I can process this search string to a certain degree, but it will look unsanitary and ugly.
Any suggestions or pointers for making it an "intelliegent" search engine?

Use this class:
class Inflection
{
static $plural = array(
'/(quiz)$/i' => "$1zes",
'/^(ox)$/i' => "$1en",
'/([m|l])ouse$/i' => "$1ice",
'/(matr|vert|ind)ix|ex$/i' => "$1ices",
'/(x|ch|ss|sh)$/i' => "$1es",
'/([^aeiouy]|qu)y$/i' => "$1ies",
'/(hive)$/i' => "$1s",
'/(?:([^f])fe|([lr])f)$/i' => "$1$2ves",
'/(shea|lea|loa|thie)f$/i' => "$1ves",
'/sis$/i' => "ses",
'/([ti])um$/i' => "$1a",
'/(tomat|potat|ech|her|vet)o$/i'=> "$1oes",
'/(bu)s$/i' => "$1ses",
'/(alias)$/i' => "$1es",
'/(octop)us$/i' => "$1i",
'/(ax|test)is$/i' => "$1es",
'/(us)$/i' => "$1es",
'/s$/i' => "s",
'/$/' => "s"
);
static $singular = array(
'/(quiz)zes$/i' => "$1",
'/(matr)ices$/i' => "$1ix",
'/(vert|ind)ices$/i' => "$1ex",
'/^(ox)en$/i' => "$1",
'/(alias)es$/i' => "$1",
'/(octop|vir)i$/i' => "$1us",
'/(cris|ax|test)es$/i' => "$1is",
'/(shoe)s$/i' => "$1",
'/(o)es$/i' => "$1",
'/(bus)es$/i' => "$1",
'/([m|l])ice$/i' => "$1ouse",
'/(x|ch|ss|sh)es$/i' => "$1",
'/(m)ovies$/i' => "$1ovie",
'/(s)eries$/i' => "$1eries",
'/([^aeiouy]|qu)ies$/i' => "$1y",
'/([lr])ves$/i' => "$1f",
'/(tive)s$/i' => "$1",
'/(hive)s$/i' => "$1",
'/(li|wi|kni)ves$/i' => "$1fe",
'/(shea|loa|lea|thie)ves$/i'=> "$1f",
'/(^analy)ses$/i' => "$1sis",
'/((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$/i' => "$1$2sis",
'/([ti])a$/i' => "$1um",
'/(n)ews$/i' => "$1ews",
'/(h|bl)ouses$/i' => "$1ouse",
'/(corpse)s$/i' => "$1",
'/(us)es$/i' => "$1",
'/s$/i' => ""
);
static $irregular = array(
'move' => 'moves',
'foot' => 'feet',
'goose' => 'geese',
'sex' => 'sexes',
'child' => 'children',
'man' => 'men',
'tooth' => 'teeth',
'person' => 'people',
'admin' => 'admin'
);
static $uncountable = array(
'sheep',
'fish',
'deer',
'series',
'species',
'money',
'rice',
'information',
'equipment'
);
public static function pluralize( $string )
{
global $irregularWords;
// save some time in the case that singular and plural are the same
if ( in_array( strtolower( $string ), self::$uncountable ) )
return $string;
// check for irregular singular forms
foreach ( $irregularWords as $pattern => $result )
{
$pattern = '/' . $pattern . '$/i';
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string);
}
// check for irregular singular forms
foreach ( self::$irregular as $pattern => $result )
{
$pattern = '/' . $pattern . '$/i';
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string);
}
// check for matches using regular expressions
foreach ( self::$plural as $pattern => $result )
{
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string );
}
return $string;
}
public static function singularize( $string )
{
global $irregularWords;
// save some time in the case that singular and plural are the same
if ( in_array( strtolower( $string ), self::$uncountable ) )
return $string;
// check for irregular words
foreach ( $irregularWords as $result => $pattern )
{
$pattern = '/' . $pattern . '$/i';
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string);
}
// check for irregular plural forms
foreach ( self::$irregular as $result => $pattern )
{
$pattern = '/' . $pattern . '$/i';
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string);
}
// check for matches using regular expressions
foreach ( self::$singular as $pattern => $result )
{
if ( preg_match( $pattern, $string ) )
return preg_replace( $pattern, $result, $string );
}
return $string;
}
public static function pluralize_if($count, $string)
{
if ($count == 1)
return "1 $string";
else
return $count . " " . self::pluralize($string);
}
}
And if you have a time use a standard way for inflection usage: http://en.wikipedia.org/wiki/Inflection
You can as array combine with XML so put all inflections data, look at how codeigniter has inflection very friendly: http://ellislab.com/codeigniter/user-guide/helpers/inflector_helper.html
Many frameworks supports built-in inflections but it will focus only in mainly English only. For other languages you should write own... or use unicode.org with some inflections standards for other languages if you need it.

parseExpression in Twig

I am integrating Twig in an existing project.
I am writing a token parser to parse a custom tag that is similiar to the {% render %} tag.
My tag looks like:
{% mytag 'somestring' with { 'name': name, 'color': 'green' } %}
where name is defined by {% set name = 'foo' %}
I am able to parse somestring without any issues.
This is the code used to parse the stuff in the with{ }:
$Stream = $this->parser->getStream();
if ( $Stream->test( \Twig_Token::NAME_TYPE, 'with' ) )
{
$Stream->next();
$parsedParameters = $this->parser->getExpressionParser()->parseExpression();
$parameters = $this->parser->getEnvironment()->compile( $parsedParameters );
var_dump( $parameters ); //string 'array( "name" => $this->getContext( $context, "name" ), "color" => "green" )' (length=72)
foreach ( $parsedParameters->getIterator() as $parameter )
{
//var_dump($ parameter->getAttribute('value') );
}
}
My goal is to turn 'name': name, 'color': 'green' into an associative array within the token parser:
array(
'name' => 'foo',
'color': 'green',
)
As the documentation is quite sparse and the code in the library is uncommented, I am not sure how to do this. If I loop through $parsedParameters, I get 4 elements consisting of the array key and an array value. However, as name is a variable with a type Twig_Node_Expression_Name, I am unsure as to how I can compile it to get the compiled value. Currently, I have found a way to compile that node, but all it gives me is a string containing a PHP expression which I can't use.
How can I turn the parsed expression into an associative array?

Okey. I guess I was able to solve that. Not to pretty, but should work.
$Stream = $this->parser->getStream();
if ( $Stream->test( \Twig_Token::NAME_TYPE, 'with' ) )
{
$Stream->next();
$parsedParameters = $this->parser->getExpressionParser()->parseExpression();
$parameters = $this->parser->getEnvironment()->compile( $parsedParameters );
var_dump( $parameters ); //string 'array( "name" => $this->getContext( $context, "name" ), "color" => "green" )' (length=72)
$index = null;
$value = null;
foreach ( $parsedParameters->getIterator() as $parameter )
{
if ( $parameter->hasAttribute( 'value' ) )
{
$index = $parameter->getAttribute( 'value' );
}
elseif ( $parameter->hasAttribute( 'name' ) )
{
$value = $parameter->getAttribute( 'name' );
}
if ( isset( $index, $value ) )
{
$params[ $index ] = $value;
$index = null;
$value = null;
}
}
}
So here I have array params, that I can pass to Custom node.
$params = var_export( $properties['params'], true );
unset( $fieldProperties['params'] );
Now I just did following:
$Compiler
->write( "\$params = array();\n" )
->write( "foreach ( {$params} as \$searchFor => \$replaceWith )\n" )
->write( "{\n" )
->write( "\t\$params[ \$searchFor ] = str_replace( \$replaceWith, \$context[\$replaceWith], \$replaceWith );\n" )
->write( "}\n" )
->write( "var_dump( \$params );\n" );
This should be it.
Also I see, that you where talking about TokenParser, but sadly I haven't found the solution to turn it over there.

Parse RFC 822 compliant addresses in a TO header

I would like to parse an email address list (like the one in a TO header) with preg_match_all to get the user name (if exists) and the E-mail. Something similar to mailparse_rfc822_parse_addresses or Mail_RFC822::parseAddressList() from Pear, but in plain PHP.
Input :
"DOE, John \(ACME\)" <john.doe#somewhere.com>, "DOE, Jane" <jane.doe#somewhere.com>
Output :
array(
array(
'name' => 'DOE, John (ACME)',
'email' => 'john.doe#somewhere.com'
),
array(
'name' => 'DOE, Jane',
'email' => 'jane.doe#somewhere.com'
)
)
Don't need to support strange E-mail format (/[a-z0-9._%-]+#[a-z0-9.-]+.[a-z]{2,4}/i for email part is OK).
I can't use explode because the comma can appear in the name. str_getcsv doesn't work, because I can have:
DOE, John \(ACME\) <john.doe#somewhere.com>
as input.
Update:
For the moment, I've got this :
public static function parseAddressList($addressList)
{
$pattern = '/^(?:"?([^<"]+)"?\s)?<?([^>]+#[^>]+)>?$/';
if (preg_match($pattern, $addressList, $matches)) {
return array(
array(
'name' => stripcslashes($matches[1]),
'email' => $matches[2]
)
);
} else {
$parts = str_getcsv($addressList);
$result = array();
foreach($parts as $part) {
if (preg_match($pattern, $part, $matches)) {
$result[] = array(
'name' => stripcslashes($matches[1]),
'email' => $matches[2]
);
}
}
return $result;
}
}
but it fails on:
"DOE, \"John\"" <john.doe#somewhere.com>
I need to test on back reference the \" but I don't remember how to do this.

Finally I did it:
public static function parseAddressList($addressList)
{
$pattern = '/^(?:"?((?:[^"\\\\]|\\\\.)+)"?\s)?<?([a-z0-9._%-]+#[a-z0-9.-]+\\.[a-z]{2,4})>?$/i';
if (($addressList[0] != '<') and preg_match($pattern, $addressList, $matches)) {
return array(
array(
'name' => stripcslashes($matches[1]),
'email' => $matches[2]
)
);
} else {
$parts = str_getcsv($addressList);
$result = array();
foreach($parts as $part) {
if (preg_match($pattern, $part, $matches)) {
$item = array();
if ($matches[1] != '') $item['name'] = stripcslashes($matches[1]);
$item['email'] = $matches[2];
$result[] = $item;
}
}
return $result;
}
}
But I'm not sure it works for all cases.

I don't know that RFC, but if the format is always as you showed then you can try something like:
preg_match_all("/\"([^\"]*)\"\\s+<([^<>]*)>/", $string, $matches);
print_r($matches);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Regex not matching polish characters - php

Related

Ignore parts of URL

Create shortcode with parameter in PHP Joomla

Magento: improving search engine (inflections, irrelevant words removal, etc.)

parseExpression in Twig

Parse RFC 822 compliant addresses in a TO header

Categories

Resources