php regular expression concatenation unwanted backslashes

php regular expression concatenation unwanted backslashes - php

I'm trying to build a regular expression in php. I tested it here https://regex101.com/ and it works fine, but that was before I knew I'd have to implement it in php and it adds backslashes where not needed.
Here's my code:
$datePattern = "\[((19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])\]";
$tag = "[a-z]+(?:-[a-z]+)*";
$regroupmentPattern = "\[($tag)?\]";
$taglistPattern = "\[((?:$tag)?(?:;(?:$tag))*)\]";
$countryPattern = "\[([a-z]{2})\]";
$freePattern = "\[([^\[\]]*)\]";
$extensionPattern = "\.(jpg|png)";
$repetitionPattern = "(?:\(\d+\))?";
$fullPattern = "/^$datePattern$regroupmentPattern$taglistPattern$countryPattern$freePattern$freePattern$extensionPattern$repetitionPattern$/";
Here is what I want :
^\[((19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])\]\[([a-z]+(?:-[a-z]+)*)?\]\[((?:[a-z]+(?:-[a-z]+)*)?(?:;(?:[a-z]+(?:-[a-z]+)*))*)\]\[([a-z]{2})\]\[([^\[\]]*)\]\[([^\[\]]*)\](?:\(\d+\))?\.(jpg|png)$
And here's what I get :
"\"\\/^\\\\[((19|20)\\\\d\\\\d)-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])\\\\]\\\\[([a-z]+(?:-[a-z]+)*)?\\\\]\\\\[((?:[a-z]+(?:-[a-z]+)*)?(?:;(?:[a-z]+(?:-[a-z]+)*))*)\\\\]\\\\[([a-z]{2})\\\\]\\\\[([^\\\\[\\\\]]*)\\\\]\\\\[([^\\\\[\\\\]]*)\\\\]\\\\.(jpg|png)(?:\\\\(\\\\d+\\\\))?$\\/\""
I assume there must be some sort of escape function, I tried preg_quote but it added yet even more backslashes.
Btw here's my full code:
<?php
class Gallery {
// Name of the gallery, used to build folder path
private $name;
function __construct($name) {
$this->name = $name;
}
/*
* Returns the list of file names in a gallery folder,
* or false if the folder doesn't exist
*/
public function getFileNames() {
$path = "../../gallery/$this->name";
if (is_dir($path)) {
$allFileNamesArray = scandir($path, SCANDIR_SORT_ASCENDING);
$filteredFileNamesArray = array();
// Building regular expression
$datePattern = "\[((19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])\]";
$tag = "[a-z]+(?:-[a-z]+)*";
$regroupmentPattern = "\[($tag)?\]";
$taglistPattern = "\[((?:$tag)?(?:;(?:$tag))*)\]";
$countryPattern = "\[([a-z]{2})\]";
$freePattern = "\[([^\[\]]*)\]";
$extensionPattern = "\.(jpg|png)";
$repetitionPattern = "(?:\(\d+\))?";
$fullPattern = "/^$datePattern$regroupmentPattern$taglistPattern$countryPattern$freePattern$freePattern$extensionPattern$repetitionPattern$/";
foreach ($allFileNamesArray as $fileName) {
$matches = array();
if (preg_match($fullPattern, $fileName, $matches, PREG_UNMATCHED_AS_NULL)) {
$filteredFileNamesArray[] = $fileName;
}
var_dump($matches);
}
return json_encode($fullPattern);
}
else {
return false;
}
}
}
?>
(Here I returned fullPattern istead of filteredFileNamesArray for debugging purpose)

You swapped $repetitionPattern and $extensionPattern.
Use
$fullPattern = "/^$datePattern$regroupmentPattern$taglistPattern$countryPattern$freePattern$freePattern$repetitionPattern$extensionPattern$/";
It will result in
^\[((19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])\]\[([a-z]+(?:-[a-z]+)*)?\]\[((?:[a-z]+(?:-[a-z]+)*)?(?:;(?:[a-z]+(?:-[a-z]+)*))*)\]\[([a-z]{2})\]\[([^\[\]]*)\]\[([^\[\]]*)\](?:\(\d+\))?\.(jpg|png)$ pattern.
See the regex demo online.

Related

Generating Csv with PHP - outputting unnecessary quotation marks

I am generating and exporting a CSV through PHP and after some modifications from my team, now it results that inside a column, double quotation marks are being generated.
I generate it through my terminal by executing this Shell script with the CakePHP Console.
/var/www/mysite.new/trunk/app/Console/cake Csv mysite.uk
The problem is that I already tried many techniques to strip them off such as: stripslashes(), str_replace(), trim()
On my last modification, I tried to apply the str_replace function.
foreach ($persons_csv as $person_csv){
/* The part where I get the data for stripping off the quotation marks */
$mail = $person_csv['Person']['email'];
$name = str_replace('"', '', $person_csv['Person']['name']);
$surname = str_replace('"', '', $person_csv['Person']['surname']);
/* REST OF THE CODE */
}
Nevertheless, it only happens to surnames and names that have more than one word in which the quotations marks are being generated.
Surnames and names that are consisting of one word, they appear to be fine.
Still, there are some anomalies probably inside names that have whitespace and therefore double quotations marks are being generated again. I am not quite sure why this is ocurring.
I can attach you two screenshots so you can have a better understanding of the problem.
If you have any idea of what it might be, it would be really appreciating.
This is the rest of my code in which I am generating the CSV.
private function addRow($row) {
$rows_deleted = 0;
if (!empty($row)){
fputcsv($this->buffer, $row, $this->delimiter, $this->enclosure);
} else {
return false;
}
}
private function renderHeaders() {
header("Content-type:application/vnd.ms-excel");
header("Content-disposition:attachment;filename=" . $this->filename);
}
private function setFilename($filename) {
$this->filename = $filename;
if (strtolower(substr($this->filename, -4)) != '.csv') {
$this->filename .= '.csv';
}
}
private function render($filename = true, $to_encoding = null, $from_encoding = "auto") {
if(PAIS) {
if ($filename) {
if (is_string($filename)) {
$this->setFilename($filename);
}
$this->renderHeaders();
}
rewind($this->buffer);
$output = stream_get_contents($this->buffer);
$url = '/var/www/mysite.new/trunk/' .'app'.DS.'webroot'.DS.'csv'.DS.PAIS.DS.$this->filename;
$gestor = fopen($url, "w+") or die("Unable to open file");
if(file_exists($url)){
file_put_contents($url, $output);
chmod($url, 0777);
fclose($gestor);
} else {
return false;
}
} else {
return false;
}
}
public function csv_persons($persons_csv) {
$this->array_final = [self::NAME, self::SURNAME];
date_default_timezone_get('Europe/Madrid');
$d = date("Ymd");
$this->addRow($this->array_final);
foreach ($persons_csv as $person_csv){
$name = str_replace('"', '', $person_csv['Person']['name']);
$surname = str_replace('"', '', $person_csv['Person']['surname']);
$apos = '&apos;';
$pos = strpos($surname, $apos);
if($pos !== false) {
$surname = str_replace('&apos;', '\'', $surname);
}
$arr = array();
$arr[$this->getArrayKeyIndex($this->array_final, self::NAME)] = $name;
$arr[$this->getArrayKeyIndex($this->array_final, self::SURNAME)] = $surname;
$this->addRow($arr);
}
$filename = 'PERSON_PROFILE_' . $d;
$this->render($filename);
}
Thanks

Instead of using fputcsv, try implode.
Ref: https://www.php.net/manual/en/function.implode.php
Update 1: You have to be sure that your value does not contain , (comma)
Update 2: If you are concern with the idea about that quoted text will be problem for your CSV datasheet, than you need to know that CSV is designed to that if there is any space between the value. So you don't have to worry about that. Any CSV parser will understand the quoted values properly.

php function to return words instead of characters not working

i'm using a php function to return words instead of characters it works fine when i pass string to the function but i have a variable equals another variable containing the string and i've tried the main variable but didn't work
////////////////////////////////////////////////////////
function words($text)
{
$words_in_text = str_word_count($text,1);
$words_to_return = 2;
$result = array_slice($words_in_text,0,$words_to_return);
return '<em>'.implode(" ",$result).'</em>';
}
$intro = $blockRow03['News_Intro'];
echo words($intro);
/* echo words($blockRow03['News_Intro']); didn't work either */
the result is nothing

str_word_count won't work correctly with accented (multi-byte) characters. you can use below sanitize words function to overcome this problem:
function sanitize_words($string) {
preg_match_all("/\p{L}[\p{L}\p{Mn}\p{Pd}'\x{2019}]*/u",$string,$matches,PREG_PATTERN_ORDER);
return $matches[0];
}
function words($text)
{
$words_in_text = sanitize_words($text);
$words_to_return = 2;
$result = array_slice($words_in_text,0,$words_to_return);
return '<em>'.implode(" ",$result).'</em>';
}
$intro = "aşağı yukarı böyle birşey";
echo words($intro);

A PHP script for MongoDB accent search needs to translate into JavaScript

I found a PHP script that let to find text in some accent characters. My project is nodejs+mongodb so I tried to translate JavaScript but I couldn't be able to translate it at all. Since I don't know PHP very well, need some help to translate.
PHP Script source code is from http://tech.rgou.net/en/php/pesquisas-nao-sensiveis-ao-caso-e-acento-no-mongodb-e-php/
/**
* Description of StringUtil
*
* #author Rafael Goulart
*/
class StringUtil {
const ACCENT_STRINGS = 'ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËẼÌÍÎÏĨÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëẽìíîïĩðñòóôõöøùúûüýÿ';
const NO_ACCENT_STRINGS = 'SOZsozYYuAAAAAAACEEEEEIIIIIDNOOOOOOUUUUYsaaaaaaaceeeeeiiiiionoooooouuuuyy';
/**
* Returns a string with accent to REGEX expression to find any combinations
* in accent insentive way
*
* #param string $text The text.
* #return string The REGEX text.
*/
static public function accentToRegex($text)
{
$from = str_split(utf8_decode(self::ACCENT_STRINGS));
$to = str_split(strtolower(self::NO_ACCENT_STRINGS));
$text = utf8_decode($text);
$regex = array();
foreach ($to as $key => $value)
{
if (isset($regex[$value]))
{
$regex[$value] .= $from[$key];
} else {
$regex[$value] = $value;
}
}
foreach ($regex as $rg_key => $rg)
{
$text = preg_replace("/[$rg]/", "_{$rg_key}_", $text);
}
foreach ($regex as $rg_key => $rg)
{
$text = preg_replace("/_{$rg_key}_/", "[$rg]", $text);
}
return utf8_encode($text);
}
}
And here is my JavaScript code that need to be translate ... Any help would be appricated! Thanks.
function accentToRegex(word){
var ACCENT_STRINGS = 'ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËẼÌÍÎÏĨÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëẽìíîïĩðñòóôõöøùúûüýÿ';
var NO_ACCENT_STRINGS = 'SOZsozYYuAAAAAAACEEEEEIIIIIDNOOOOOOUUUUYsaaaaaaaceeeeeiiiiionoooooouuuuyy';
var from = (ACCENT_STRINGS).split(decode_utf8(ACCENT_STRINGS));
var to = NO_ACCENT_STRINGS.split(NO_ACCENT_STRINGS.toLocaleLowerCase());
var text = decode_utf8(word);
var regex = new Array();
for(value in to)
{
if (!regex[value])
{
regex[value] = from[value];
} else {
regex[value] = value;
}
}
for (rg_key in regex)
{
// "$TESTONE $TESTONE".replace( new RegExp("\\$TESTONE","gm"),"foo")
// text = preg_replace("/[$rg]/", "_{$rg_key}_", $text);
text = text.replace(new RegExp(/[rg]/), new RegExp(_{rg_key}_)) ;
}
foreach (rg in regex)
{
// $text = preg_replace("/_{$rg_key}_/", "[$rg]", $text);
text = text.replace(new RegExp(/[rg]/), new RegExp(_{rg_key}_)) ;
}
return encode_utf8(text); //Edited from $text to text
}
function encode_utf8(s) {
return encodeURIComponent(s);
}
function decode_utf8(s) {
return decodeURIComponent(s);
}
The error is

This
return encode_utf8($text);
should be
return encode_utf8(text); // no dollar-sign
but I haven't looked beyond this.
The JavaScript RegExp constructor expects a string, without delimiters. The delimiters are usually / but your expression is attempting to use underscores. JS only uses /.
new RegExp("hello")
new RegExp("[xy]") // will look for either the character 'x' or 'y'
new RegExp(yourStringVariable)
new RegExp("[" + someVar + "]")
// .. will look for any of the letters in the variable someVar
These are the various ways you can use RegExp.
for each statements in JS are two separate words for each (variable in object). But it is also deprecated and not widely supported. MDN link
Your second foreach also refers to the variable rg_key which doesn't have a meaningful value here.
I think there are a few other things that still require converting.

Parsing Out Code Between Comment Blocks PHP

Lets say I have the following piece of start in a PHP file:
/**
* #SomethingStart
*/
protected static $var1 = '1';
protected static $var2 = '2';
protected static $var3 = '3';
/**
* #SomethingEnd
*/
I am trying to figure out how I can first parse out the content between the comments with #SomethingStart and #SomethingEnd (not including the comment and then secondly, how I can replace the content between those two tags.

You can get the contents of the file with the function:
file
http://www.php.net/manual/en/function.file.php
That returns an array of lines. Then you can use foreach, and match the line content with
$switch = false;
$lines = file('filepath');
$string = '';
foreach($lines as $k => $v)
{
if(preg_match('/#(.*)End$/'. $v))
{
$switch = false;
break;
}
if($switch == true)
{
// do replacements, or anything you want with the following lines
// or add, or remove, even if you might have some problems with it
// for this you might not consider using foreach, instead you might
// try array_walk
}
if(preg_match('/#(.*)Start$/', $v))
{
$switch = true;
}
$string .= $v;
}
echo $string;
For array_walk, read this http://www.php.net/manual/en/function.array-walk.php
Try it.

Using str_replace() and return two different versions

I have a textarea where I can type in multiple domain names to check availability.
In my script below I replace space with “-“. But I also want a version of the domain name without “-”.
Say for example that I type in:
a good word
a nother god word
Then I want the script to return both:
a-good-word
a-nother-god-word
And also:
agoodword
anothergodword
$domaininput = (isset($_POST['domainname'])) ? $_POST['domainname'] : '';
$change_space = str_replace(" ","-",$domaininput);
$change_new_line = str_replace("\n",",",$change_space);
$manydomains = explode("," , $change_new_line);
foreach ($manydomains as $domain){
//some code
}
How can that be done in PHP?

$manydomains = ... // as above...
// Get alternatives by removing dashes:
$alternatives = explode(",", str_replace("-", "", $change_new_line));
// If a domain name did not contain a dash, there will be duplicates in
// $manydomains and $alternatives. array_unique() takes care of those.
$manydomains = array_unique(array_merge($manydomains, $alternatives));

Try:
$domaininput = (isset($_POST['domainname'])) ? $_POST['domainname'] : '';
$change_space = str_replace(" ","-",$domaininput);
$change_space .= str_replace(" ", "", $domaininput);
$change_new_line = str_replace("\n",",",$change_space);
$manydomains = explode("," , $change_new_line);
foreach ($manydomains as $domain){
//some code
}

if you not got it to work yet here a crude function todo the job plus it will only return unique domains
function getDomains($domaininput){
function domainA($domaininput){
$return = str_replace(" ","-",$domaininput);
return $return;
}
function domainB($domaininput){
$return = str_replace("-","",$domaininput);
return $return;
}
$domaininput = str_replace("\n",",",$domaininput);
$return = domainA($domaininput)."\n";
$return .=domainB($return);
$return = str_replace("\n",",",$return);
$return = trim($return,'-,');
$return = explode(",",$return);
$return = array_unique($return);
return $return;
}
$manydomains = getDomains($domaininput);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php regular expression concatenation unwanted backslashes - php

Related

Generating Csv with PHP - outputting unnecessary quotation marks

php function to return words instead of characters not working

A PHP script for MongoDB accent search needs to translate into JavaScript

Parsing Out Code Between Comment Blocks PHP

Using str_replace() and return two different versions

Categories

Resources