JavaScript regular expressions. Special characters in the replacement string

Regular expressions are a language that describes string patterns based on metacharacters. A metacharacter is a character in a regular expression that describes some class of characters in a string, indicates the position of a substring, indicates the number of repetitions, or groups characters into a substring. For example, the metacharacter \d describes digits, and $ denotes the end of a line. A regular expression can also contain ordinary characters that describe themselves. The set and meaning of metacharacters in regular expressions is described by the PCRE standard, most of whose features are supported in JS.

Scope of application regular expressions

Regular expressions are typically used for the following tasks:

  • Matching. The goal of this task will be to find out whether a certain text matches a given regular expression.
  • Search . Using regular expressions, it is convenient to find the corresponding substrings and extract them from the text.
  • Replacement . Regular expressions often help not only to find, but also to replace a substring in the text that matches the regular expression.

Ultimately, using regular expressions you can, for example:

  • Check that the user data in the form is filled out correctly.
  • Find a link to an image in the text entered by the user so that it can be automatically attached to the message.
  • Remove html tags from the text.
  • Check code before compilation for simple syntax errors.
Features of regular expressions in JS. Regular expression literals

The main feature of regular expressions in JS is that there is a separate species literals. Just as string literals are surrounded by quotation marks, regular expression literals are surrounded by slashes (/). Thus, JS code can contain expressions like:

console.log(typeof /tcoder/); // object

In fact, the regular expression that is defined in the line

var pattern = new RegExp("tcoder");

This creation method is usually used when you need to use variables in a regular expression, or create a regular expression dynamically. In all other cases, regular expression literals are used due to the shorter syntax and the absence of the need to additionally escape some characters.

Characters in regular expressions

All alphanumeric characters in regular expressions are not metacharacters and describe themselves. This means that the regular expression /tcoder/ will match the substring tcoder. In regular expressions, you can also specify non-alphabetic characters, such as newline (\n), tab (\t) and so on. All these symbols also correspond to themselves. Preceding an alphabetic character with a backslash (\) will make it a metacharacter, if there is one. For example, the alphabetic character "d" will become a metacharacter describing digits if it is preceded by a slash (\d).

Character classes

Single characters in regular expressions can be grouped into classes using square brackets. The class created in this way corresponds to any of the symbols included in it. For example, the regular expression // the letters “t”, “c”, “o”, “d”, “e”, “r” will correspond.

In classes you can also specify a range of characters using a hyphen. For example, a class corresponds to a class. Note that some metacharacters in regular expressions already describe character classes. For example, the \d metacharacter is equivalent to the class . Note that metacharacters describing character classes can also be included in the classes. For example, the class [\da-f] corresponds to the numbers and letters “a”, “b”, “d”, “e”, “f”, that is, any hexadecimal character.

It is also possible to describe a character class by specifying characters that should not be included in it. This is done using the metacharacter ^. For example, the class [^\d] will match any character other than a number.

Repetitions

Now we can describe, say, decimal number of any given length, simply by writing in a row as many metacharacters \d as there are digits in this number. Agree that this approach is not very convenient. In addition, we cannot describe the range of required repetitions. For example, we cannot describe a number with one or two digits. Fortunately, regular expressions allow you to describe repetition ranges using metacharacters. To do this, after the symbol, simply indicate the range of repetitions in curly braces. For example, the regular expression /tco(1, 3)der/ the strings "tcoder", "tcooder" and "tcoooder" will match. If you omit the maximum number of repetitions, leaving a comma and a minimum number of repetitions, you can specify a number of repetitions greater than the specified one. For example, the regular expression /bo(2,)bs/ will match the strings “boobs”, “booobs”, “boooobs” and so on with any number of “o” letters, at least two.

If you omit the comma in the curly brackets and simply indicate one number, then it will indicate the exact number of repetitions. For example, the regular expression /\d(5)/ correspond to five-digit numbers.

Some repetition ranges are used quite often and have their own metacharacters to denote them.

Greedy repetitions

The above syntax describes the maximum number of repetitions, that is, from all possible numbers of repetitions, the number of which lies in the specified range, the maximum is selected. Such repetitions are called greedy. This means that the regular expression /\d+/ in the string yeah!!111 will match the substring “111”, not “11” or “1”, although the metacharacter “+” describes one or more repetitions.

If you want to implement non-greedy repetition, that is, select the minimum possible number of repetitions from the specified range, then simply put the “?” after the rep range. For example, the regular expression /\d+?/ in the string “yeah!!111” the substring “1” will match, and the regular expression /\d(2,)/ in the same line the substring “11” will match.

It is worth paying attention to important feature greedy repetition. Consider the regular expression /bo(2,)?bs/. In the line “i like big boooobs” it will be matched, as with greedy repetition, by the substring boooobs, and not boobs, as one might think. The fact is that a regular expression cannot match several substrings located in different places lines. That is, our regular expression cannot match the substrings “boo” and “bs” merged into one line.

Alternatives

In regular expressions, you can also use alternatives - to describe a set of strings that matches either one or the other part of the regular expression. Such parts are called alternatives and are separated by a vertical line. For example, the regular expression /two|twice|\2/ either the substring “two”, or the substring “twice”, or the substring “2” can match. The chain of alternatives is processed from left to right until the first match and can only be matched by a substring that is described by only one alternative. For example, the regular expression /java|script/ in the string “I like javascript” only the substring “java” will match.

Groups

To treat multiple characters as a single unit when using repetition ranges, character classes, and everything in between, simply put them in parentheses. For example, the regular expression /true(coder)?/ the strings "truecoder" and "true" will match.

Links

In addition to the fact that parentheses combine characters in a regular expression into a single whole, the corresponding substring can be referenced by simply specifying after the slash the number of the left parenthesis from the pair of parentheses surrounding it. Brackets are numbered from left to right starting with one. For example, in the regular expression /(one(two)(three))(four)/\1 refers to one, \2 to "two", \3 to "three", \4 to "four". As an example of using such links, we give a regular expression /(\d)\1/, which corresponds to two-digit numbers with the same digits. An important limitation of using backlinks is the impossibility of using them in classes, that is, for example, describing a two-digit number with different numbers regular expression /(\d)[^\1]/ it is forbidden.

Unmemorable parentheses

Often you just want to group the symbols, but there is no need to create a link. In this case, you can write ?: immediately after the left grouping bracket. For example, in a regular expression /(one)(?:two)(three)/\2 will indicate "three".

Such parentheses are sometimes called non-remembering. They have another important feature, which we will talk about in the next lesson.

Specifying a position

In regular expressions, there are also metacharacters that indicate a certain position in the string. The most commonly used symbols are ^ and $, indicating the beginning and end of a line. For example, the regular expression /\..+$/ extensions in file names will match, and the regular expression /^\d/ the first digit in the line, if there is one.

Positive and negative forward checks

Using regular expressions, you can also describe a substring that is followed or not followed by a substring described by another pattern. For example, we need to find the word java only if it is followed by “script”. This problem can be solved using a regular expression /java(?=script)/. If we need to describe the substring “java” that is not followed by script, we can use a regular expression /java(?!script)/.

Let's collect everything we talked about above into one table.

Symbol Meaning
a|b Matches either a or i.
(…) Grouping brackets. You can also refer to the substring corresponding to the pattern in brackets.
(?:…) Only grouping, without the ability to link.
\n Link to a substring matching the nth pattern.
^ The beginning of the input data or the beginning of the line.
$ End of input or end of line.
a(?=b) Matches the substring described by pattern a only if it is followed by the substring described by pattern b.
a(?!b) Matches the substring described by pattern a only if it is not followed by the substring described by pattern b.
Flags

And finally last element regular expression syntax. Flags specify matching rules that apply to the entire regular expression. Unlike all other elements in regular expression syntax, they are written immediately after the regular expression literal, or passed in line as the second parameter to the object's constructor RegExp.

There are only three regular expression flags in JavaScript:

i – when specifying this flag, case is not taken into account, that is, for example, a regular expression \javascript\i will match the strings "javascript", "JavaScript", "JAVASCRIPT", "jAvAScript", etc.

m – this flag enables multi-line search. This means that if the text contains line feed characters and this flag is set, then the symbols ^ and $, in addition to the beginning and end of the entire text, will also correspond to the beginning and end of each line in the text. For example, the regular expression /line$/m matches the substring “line”, both in the string “first line” and in the string “one\nsecond line\ntwo”.

g – enables global search, that is, a regular expression, if this flag is enabled, will match all substrings that match it, and not just the first, as is the case if this flag is not present.

Flags can be combined with each other in any order, that is \tcoder\mig, \tcoder\gim, \tocder\gmi etc., it's the same thing. The order of the flags also does not matter if they are passed in a line as the second argument to the object constructor RegExp, that is new RegExp("tcoder", "im") And new RegExp("tcoder", "im") just the same thing.

ZY

Regular expressions are a very powerful and convenient tool for working with strings, allowing you to reduce hundreds of lines of code into a single expression. Unfortunately, their syntax is sometimes too complex and difficult to read, and even the most experienced developer can forget what a rather complex regular expression he wrote a couple of days ago meant if he did not comment on it. For these reasons, sometimes it is still worth abandoning regular expressions in favor of conventional methods for working with strings.

The RegExp class in JavaScript is a regular expression - an object that describes a character pattern. RegExp objects are typically created using the special literal syntax presented below, but can also be created using the RegExp() constructor.

Syntax // using special literal syntax var regex = /pattern /flags ; // using the constructor var regex = new RegExp("pattern ", "flags "); var regex = new RegExp(/pattern /, "flags ");

Parameter values:

Regular expression flags Flag Description
gAllows you to find all matches rather than stopping after the first match ( global match flag).
iAllows case-insensitive matching ( ignore case flag).
mThe matching is done across multiple rows. The leading and trailing characters (^ and $) are processed across multiple lines, meaning that the match occurs at the beginning or end of each line (delimiters \n or \r), and not just the beginning or end of the entire line ( multiline flag).
uThe pattern will be interpreted as a sequence of Unicode code points ( unicode flag).
yMatching occurs at the index pointed to by the lastIndex property of this regular expression, while matching is not performed at a later or earlier index ( sticky flag).
Character sets Metacharacters Symbol Description
. Allows you to find a single character other than a newline or end-of-line character (\n, \r, \u2028 or \u2029).
\dAllows you to find a number symbol in the basic Latin alphabet. Equivalent to using the character set.
\DAllows you to find any character that is not a number in the basic Latin alphabet. Equivalent to the character set [^0-9].
\sAllows you to find a single whitespace character. Whitespace refers to space, tab, pagefeed, linefeed, and other Unicode whitespace characters. Equivalent to the character set [\f\n\r\t\v​\u00a0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​ \u2028\u2029​\u202f\u205f​\u3000].
\SAllows you to find a single character that is not whitespace. Whitespace refers to space, tab, pagefeed, linefeed, and other Unicode whitespace characters. Equivalent to the character set [^ \f\n\r\t\v​\u00a0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a ​\u2028\u2029​\u202f\u205f​\u3000].
[\b]Allows you to find the backspace character (special character \b, U+0008).
\0 Allows you to find the symbol 0 (zero).
\nAllows you to find the newline character.
\fAllows you to find the page feed character.
\rAllows you to find the carriage return character.
\tAllows you to find the horizontal tab character.
\vAllows you to find the vertical tab character.
\wAllows you to find any alphanumeric character in the basic Latin alphabet, including underscores. Equivalent to the character set.
\WAllows you to find any character that is not a character from the basic Latin alphabet. Equivalent to the character set [^a-Za-z0-9_].
\cXAllows you to find a control character in a string. Where X is the letter from A to Z. For example, /\cM/ represents the Ctrl-M character.
\xhhAllows you to find a symbol using hexadecimal value(hh is a two-digit hexadecimal value).
\uhhhhAllows you to find a character using UTF-16 encoding (hhhh is a four-digit hexadecimal value).
\u(hhhh) or
\u(hhhhh)
Allows you to find a character with a Unicode value of U+hhhh or U+hhhhh (hexadecimal value). Only when the u flag is given.
\ Indicates that the following character is special and should not be interpreted literally. For characters that are usually interpreted in a special way, specifies that the following character is not special and should be interpreted literally.
Restrictions Quantifiers Symbol Description
n*The match occurs on any string containing zero or more occurrences of the character n.
n+Matching occurs with any string containing at least one character n.
n?The match occurs on any string with a preceding element n zero or one time.
n(x)Matches any string containing a sequence of characters n a certain amount once x. X
n(x,) x occurrences of the preceding element n. X must be a positive integer.
n(x, y)Matches any string containing at least x, but no more than with y occurrences of the preceding element n. X And y must be positive integers.
n*?
n+?
n??
n(x)?
n(x,)?
n(x,y)?
The comparison occurs by analogy with the quantifiers *, +, ? and (...), however, the search is for the minimum possible comparison. The default is "greedy" mode, ? at the end of the quantifier allows you to set a “non-greedy” mode in which the comparison is repeated the minimum possible number of times.
x(?=y)Allows you to compare x, only if for x should y.
x(?!y)Allows you to compare x, only if for x shouldn't y.
x|yThe comparison occurs with any of the specified alternatives.
Grouping and backlinks Symbol Description
(x)Allows you to find a symbol x and remember the result of the comparison ("capturing parentheses"). The matched substring can be called from the resulting array elements ..., [n], or from the properties of the predefined RegExp object $1 ..., $9.
(?:x)Allows you to find a symbol x, but does not remember the result of the match ("non-capturing parentheses"). The matched substring cannot be called from the resulting array elements ..., [n], or from the properties of the predefined RegExp object $1 ..., $9.
\nA return reference to the last substring that matches the nth one in parentheses in a regular expression (numbering of parentheses goes from left to right). n must be a positive integer.

The syntax of regular expressions is quite complex and requires serious effort to learn. The best guidance Today's book on regular expression is J. Friedl's book "Regular Expressions", which allows, according to the author, "to learn to think in regular expressions."

Basic Concepts

Regular expression is a means of processing strings or a sequence of characters that defines a text pattern.

Modifier - is intended to “instruct” the regular expression.

Metacharacters are special characters that serve as commands in the regular expression language.

A regular expression is set as a regular variable, only a slash is used instead of quotes, for example: var reg=/reg_expression/

By the simplest templates we mean those templates that do not require any special characters.

Let's say our task is to replace all letters "r" (small and capital) with Latin capital letter"R" in the phrase Regular Expressions.

Create a template var reg=/р/ and using the method replace we carry out our plans



var reg=/р/

document.write(result)

As a result, we get the line - Regular expressions, the replacement occurred only on the first occurrence of the letter “p”, taking into account the case.

But this result does not fit the conditions of our task... Here we need the modifiers “g” and “i”, which can be used both separately and together. These modifiers are placed at the end of the regular expression pattern, after the slash, and have the following meanings:

modifier "g" - sets the search in the line as "global", i.e. in our case, the replacement will occur for all occurrences of the letter “p”. Now the template looks like this: var reg=/р/g , substituting it in our code


var str="Regular expressions"
var reg=/р/g
var result=str.replace(reg, "R")
document.write(result)

we get the string - Regular expressions.

modifier "i" - specifies a case-insensitive search in a string. By adding this modifier to our template var reg=/р/gi, after executing the script we will get the desired result of our task - regular expressions.

Special characters (metacharacters)

Metacharacters specify the type of characters of the searched string, the way the searched string is surrounded in the text, as well as the number of characters of a particular type in the viewed text. Therefore, metacharacters can be divided into three groups:

  • Metacharacters for searching for matches.
  • Quantitative metacharacters.
  • Positioning metacharacters.
Metacharacters for matching

Meaning

Description

word boundary

specifies a condition under which the pattern should be executed at the beginning or end of a word

/\ber/ matches error, does not match hero or with player
/er/ matches player, does not match hero or with error
/\ber\b/ does not match hero or with player or with error, can only coincide with er

not a word limit

specifies a condition under which the pattern is not executed at the beginning or end of a word

/\Ber/ matches hero or with player, does not match error
/er\B/ matches error or with player, does not match hero
/\Ber\B/ matches hero, does not match player or with error

number from 0 to 9

/\d\d\d\d/ matches any four-digit number

/\D\D\D\D/ will not match 2005 or 05.g or №126 etc.

single empty character

matches the space character

\over\sbyte\ matches only over byte

single non-blank character

any single character except space

\over\Sbyte\ matches over-byte or with over_byte, does not match over byte or over-byte

letter, number or underscore

/A\w/ matches A1 or with AB, does not match A+

not a letter, number or underscore

/A\W/ does not match A1 or with AB, coincides with A+

any character

any signs, letters, numbers, etc.

/.../ matches any three characters ABC or !@4 or 1 q

character set

specifies a condition under which the pattern must be executed for any match of characters enclosed in square brackets

/WERTY/ matches QWERTY, With AWERTY

set of non-included characters

specifies a condition under which the pattern should not be executed for any match of characters enclosed in square brackets

/[^QA]WERTY/ does not match QWERTY, With AWERTY

The characters shown in the table "Match Search Metacharacters" should not be confused with the sequence of escape characters used in strings, such as \\t - tab, \\n - escape new line etc.

Quantitative metacharacters

Number of matches

Zero or more times

/Ja*vaScript/ matches JavaScript or with JavaScript or with JavaScript, does not match JovaScript

Zero or one time

/Ja?vaScript/ matches only JavaScript or with JavaScript

One or more times

/Ja+vaScript/ matches JavaScript or with JavaScript or with JavaScript, does not match JavaScript

exactly n times

/Ja(2)vaScript/ matches only JavaScript

n or more times

/Ja(2,)vaScript/ matches JavaScript or with JavaScript, does not match JavaScript or with JavaScript

at least n times, but not more than m times

/Ja(2,3)vaScript/ matches only JavaScript or with JavaScript

Each character listed in the Quantitative Metacharacters table applies to one preceding character or metacharacter in the regular expression.

Positioning metacharacters

The last set of metacharacters are intended to indicate whether to look for (if important) the substring at the beginning of the line or at the end.

Some methods for working with templates

replace - this method we already used it at the very beginning of the article, it is designed to search for a pattern and replace the found substring with a new substring.

exec - this method performs a string match against the pattern specified by the template. If pattern matching fails, null is returned. Otherwise, the result is an array of substrings matching the given pattern. /*The first element of the array will be equal to the source string that satisfies the given pattern*/

For example:


var reg=/(\d+).(\d+).(\d+)/
var arr=reg.exec("I was born on September 15, 1980")
document.write("Date of birth: ", arr, "< br>")
document.write("Birthday: ", arr, "< br>")
document.write("Birth month: ", arr, "< br>")
document.write("Year of birth: ", arr, "< br>")

As a result, we get four lines:
Date of birth: 09/15/1980
Birthday: 15
Birth month: 09
Year of birth: 1980

Conclusion

The article does not show all the capabilities and delights of regular expressions; for a deeper study of this issue, I advise you to study the RegExp object. I also want to draw your attention to the fact that the syntax of regular expressions is no different in both JavaScript and PHP. For example, to check whether an e-mail is entered correctly, the regular expression for both JavaScript and PHP will look the same /+@+.(2,3)/i .

JavaScript regexp is an object type that is used to match a sequence of characters in strings.

Creating the first regular expression

There are two ways to create a regular expression: using a regular expression literal or using a regular expression builder. Each of them represents the same pattern: the character "c" followed by "a" and then the character "t".

// regular expression literal is enclosed in slashes (/)
var option1 = /cat/;
// Regular expression constructor
var option2 = new RegExp("cat");

As a general rule, if the regular expression is going to be constant, meaning it won't change, it's better to use a regular expression literal. If it will change or depend on other variables, it is better to use a method with a constructor.

RegExp.prototype.test() method

Remember when I said that regular expressions are objects? This means that they have a number of methods. The simplest method is JavaScript regexp test, which returns a boolean value:

True: The string contains a regular expression pattern.

False: No match found.

console.log(/cat/.test(“the cat says meow”));
// right
console.log(/cat/.test(“the dog says bark”));
// incorrect

Regular Expression Basics Cheat Sheet

The secret of regular expressions is to remember common characters and groups. I highly recommend spending a few hours on the chart below and then coming back and studying further.

Symbols
  • . – (dot) matches any single character with the exception of line breaks;
  • *  –  matches the previous expression, which is repeated 0 or more times;
  • +  –  matches a previous expression that is repeated 1 or more times;
  • ? – the previous expression is optional (matches 0 or 1 times);
  • ^ – corresponds to the beginning of the line;
  • $ – matches the end of the line.
Character groups
  • d – matches any single numeric character.
  • w – matches any character (number, letter, or underscore).
  • [XYZ ]   –  a set of characters. Matches any single character from the set specified in parentheses. You can also specify character ranges, for example, .
  • [XYZ ]+   –  matches a character from a set repeated one or more times.
  • [^A -Z ] — within the character set, “^” is used as a negation sign. IN in this example The pattern matches anything that is not an uppercase letter.

Flags:

There are five optional flags in JavaScript regexp. They can be used separately or together, and are placed after the closing slash. For example: /[A -Z ]/g . Here I will give only two flags.

g – global search.

i   –  case-insensitive search.

Additional designs

(x)   –   capturing parentheses. This expression matches x and remembers that match so you can use it later.

(?:x )   –  non-capturing parentheses. The expression matches x but does not remember the match.

Matches x only if it is followed by y.

Let's test the material we've studied

First, let's test all of the above. Let's say we want to check a string for any numbers. To do this, you can use the “d” construction.

console.log(/d/.test("12-34"));
// right

The above code returns true if there is at least one digit in the string. What to do if you need to check a string for compliance with the format? You can use multiple "d" characters to define the format:

console.log(/dd-dd/.test("12-34"));
//right
console.log(/dd-dd/.test("1234"));
//wrong

If you don't care how the numbers come before and after the "-" sign in JavaScript regexp online, you can use the "+" symbol to show that the "d" pattern occurs one or more times:

console.log(/d+-d+/.test("12-34"));
// right
console.log(/d+-d+/.test("1-234"));
// right
console.log(/d+-d+/.test("-34"));
// incorrect

For simplicity, you can use parentheses to group expressions. Let's say we have a cat meowing, and we want to check for a match to the pattern "meow" (meow):

console.log(/me+(ow)+w/.test("meeeeowowoww"));
// right

Now let's figure it out.

m => match one letter ‘m’;

e + => match the letter "e" one or more times;

(ow) + => match the letters "ow" one or more times;

w => matches the letter ‘w’;

‘m’ + ‘eeee’ + ‘owowow’ + ‘w’ .

When operators like "+" are used immediately after parentheses, they affect the entire contents of the parentheses.

Operator "? " It indicates that the previous character is optional. As you'll see below, both test cases return true because the "s" characters are marked as optional.

console.log(/cats? says?/i.test("the Cat says meow"));
//right
console.log(/cats? says?/i.test("the Cats say meow"));
//right

If you want to find a slash character, you need to escape it using a backslash. The same is true for other characters that have special meaning, such as the question mark. Here's a JavaScript regexp example of how to look for them:

var slashSearch = ///;
var questionSearch = /?/;

  • d is the same as : each construction corresponds to a digital character.
  • w is the same as [A -Za -z 0-9_] : both expressions match any single alphanumeric character or underscore.
Example: adding spaces to camel-style lines

In this example, we're really tired of the camel style of writing and we need a way to add spaces between words. Here's an example:

removeCc("camelCase") // => should return "camel Case"

There is a simple solution using a regular expression. First, we need to find all capital letters. This can be done using a character set lookup and a global modifier.

This matches the character "C" in "camelCase"

Now, how to add a space before "C"?

We need to use captivating parentheses! They allow you to find a match and remember it to use later! Use catching brackets to remember the capital letter you find:

You can access the captured value later like this:

Above we use $1 to access the captured value. By the way, if we had two sets of capturing parentheses, we would use $1 and $2 to refer to the captured values, and similarly for more capturing parentheses.

If you need to use parentheses but don't need to capture that value, you can use non-capturing parentheses: (?: x ). In this case, a match to x is found, but it is not remembered.

Let's return to the current task. How do we implement capturing parentheses? Using the JavaScript regexp replace method! We pass "$1" as the second argument. It is important to use quotation marks here.

function removeCc(str)(
return str.replace(/()/g, "$1");
}


Let's look at the code again. We grab the capital letter and then replace it with the same letter. Inside the quotes, insert a space followed by the variable $1 . As a result, we get a space after each capital letter.

function removeCc(str)(
return str.replace(/()/g, " $1");
}
removeCc("camelCase") // "camel Case"
removeCc("helloWorldItIsMe") // "hello World It Is Me"

Example: removing capital letters

Now we have a line with a bunch of unnecessary capital letters. Have you figured out how to remove them? First, we need to select all capital letters. Then we search for a character set using the global modifier:

We'll use the replace method again, but how do we make the character lowercase this time?

function lowerCase(str)(
return str.replace(//g, ???);
}


Hint: In the replace() method, you can specify a function as the second parameter.

We will use arrow function to avoid capturing the value of the match found. When using the function in JavaScript method regexp replace This function will be called after finding matches and the result of the function is used as a replacement string. Even better, if the match is global and multiple matches are found, the function will be called for each match found.

function lowerCase(str)(
return str.replace(//g, (u) => u.toLowerCase());
}
lowerCase("camel Case") // "camel case"
lowerCase("hello World It Is Me") // "hello world it is me"

Example: convert the first letter to capital

capitalize("camel case") // => should return "Camel case"

Let's use the function in the replace() method again. However, this time we only need to look for the first character in the string. Recall that the symbol “^” is used for this.

Let's dwell on the "^" symbol for a second. Remember the example given earlier:

console.log(/cat/.test("the cat says meow"));
//right

When adding a "^" character, the function no longer returns true because the word "cat" is not at the beginning of the line.

Regular Expressions

A regular expression is an object that describes a character pattern. The RegExp class in JavaScript represents regular expressions, and the String and RegExp class objects define methods that use regular expressions to perform pattern matching and text search and replacement operations. Regular expression grammar in JavaScript contains a fairly complete subset of the regular expression syntax used in Perl 5, so if you have experience with the Perl language, you can easily describe patterns in JavaScript programs.

Features of Perl regular expressions that are not supported in ECMAScript include the s (single-line mode) and x (extended syntax) flags; escape sequences \a, \e, \l, \u, \L, \U, \E, \Q, \A, \Z, \z and \G and other extended constructs starting with (?.

Defining Regular Expressions

In JavaScript, regular expressions are represented by RegExp objects. RegExp objects can be created using the RegExp() constructor, but more often they are created using a special literal syntax. Just as string literals are specified as characters surrounded by quotation marks, regular expression literals are specified as characters surrounded by a pair of slash characters (/). So your JavaScript code might contain lines like this:

Var pattern = /s$/;

This line creates new object RegExp and assigns it to the pattern variable. This object RegExp looks for any strings ending with an "s". The same regular expression can be defined using the RegExp() constructor:

Var pattern = new RegExp("s$");

A regular expression pattern specification consists of a sequence of characters. Most characters, including all alphanumeric ones, literally describe the characters that must be present. That is, the regular expression /java/ matches all lines containing the substring “java”.

Other characters in regular expressions are not intended to be used to find their exact equivalents, but rather have special meanings. For example, the regular expression /s$/ contains two characters. The first character s denotes a search for a literal character. Second, $ is a special metacharacter that marks the end of a line. So this regular expression matches any string ending with the character s.

The following sections describe the various characters and metacharacters used in regular expressions in JavaScript.

Literal characters

As noted earlier, all alphabetic characters and numbers in regular expressions match themselves. Regular expression syntax in JavaScript also supports the ability to specify certain non-alphabetic characters using escape sequences starting with a backslash (\) character. For example, the sequence \n matches the newline character. These symbols are listed in the table below:

Some punctuation marks have special meanings in regular expressions:

^ $ . * + ? = ! : | \ / () { } -

The meaning of these symbols is explained in the following sections. Some of them have special meaning only in certain regular expression contexts, while in other contexts they are interpreted literally. However, in general, to literally include any of these characters in a regular expression, you must precede it with a backslash character. Other characters, such as quotes and @, have no special meaning and simply match themselves in regular expressions.

If you can't remember exactly which characters should be preceded by a \, you can safely put a backslash in front of any of the characters. However, keep in mind that many letters and numbers become special meaning, so the letters and numbers you are looking for literally should not be preceded by a \ character. To include the backslash character itself in a regular expression, you must obviously precede it with another backslash character. For example, the following regular expression matches any string that contains a backslash character: /\\/.

Character classes

Individual literal characters can be combined into character classes by enclosing them in square brackets. A character class matches any character contained in that class. Therefore, the regular expression // matches one of the characters a, b, or c.

Negative character classes can also be defined to match any character except those specified in parentheses. The negation character class is specified by the ^ character as the first character following the left parenthesis. The regular expression /[^abc]/ matches any character other than a, b, or c. In character classes, a range of characters can be specified using a hyphen. All lowercase Latin characters are found using the // expression, and any letter or number from the Latin character set can be found using the // expression.

Certain character classes are particularly common, so regular expression syntax in JavaScript includes special characters and escape sequences to represent them. Thus, \s matches space, tab, and any Unicode whitespace characters, and \S matches any non-Unicode whitespace characters.

The table below provides a list of these special characters and the syntax of the character classes. (Note that some of the character class escape sequences match only ASCII characters and are not extended to work with Unicode characters. You can explicitly define your own Unicode character classes, for example /[\u0400-\u04FF]/ matches any character Cyrillic alphabet.)

JavaScript Regular Expression Character Classes Symbol Correspondence
[...] Any of the characters shown in parentheses
[^...] Any of the characters not listed in parentheses
. Any character other than a newline or other Unicode line delimiter
\w Any ASCII text character. Equivalent
\W Any character that is not an ASCII text character. Equivalent to [^a-zA-Z0-9_]
\s Any whitespace character from the Unicode set
\S Any non-whitespace character from the Unicode set. Please note that the characters \w and \S are not the same thing
\d Any ASCII numbers. Equivalent
\D Any character other than ASCII numbers. Equivalent to [^0-9]
[\b] Backspace character literal

Note that escape sequences special characters classes may be in square brackets. \s matches any whitespace character and \d matches any digit, hence /[\s\d]/ matches any whitespace character or digit.

Repetition

Given the knowledge of regular expression syntax gained so far, we can describe a two-digit number as /\d\d/ or a four-digit number as /\d\d\d\d/, but we cannot, for example, describe a number consisting of any number of digits, or a string of three letters followed by an optional digit. These more complex patterns use regular expression syntax to specify how many times the this element regular expression.

Repeat symbols always follow the pattern to which they are applied. Some types of repetitions are used quite often, and special symbols are available to indicate these cases. For example, + matches one or more instances of the previous pattern. The following table provides a summary of the repetition syntax:

The following lines show several examples:

Var pattern = /\d(2,4)/; // Matches a number containing two to four digits pattern = /\w(3)\d?/; // Match exactly three word characters and one optional digit pattern = /\s+java\s+/; // Matches the word "java" with one or more spaces // before and after it pattern = /[^(]*/; // Matches zero or more characters other than the opening parenthesis

Be careful when using repetition characters * and ?. They can match the absence of a pattern specified before them and therefore the absence of characters. For example, the regular expression /a*/ matches the string "bbbb" because it does not contain the character a.

The repetition characters listed in the table represent the maximum possible number of repetitions that will allow subsequent parts of the regular expression to be matched. We say this is greedy repetition. It is also possible to implement repetition performed in a non-greedy manner. It is enough to indicate after the symbol (or symbols) the repetition question mark: ??, +?, *? or even (1.5)?.

For example, the regular expression /a+/ matches one or more instances of the letter a. Applied to the string "aaa", it matches all three letters. On the other hand, the expression /a+?/ matches one or more instances of the letter a and selects the least possible number of characters. Applied to the same string, this pattern matches only the first letter a.

“Greedless” repetition does not always give the expected result. Consider the pattern /a+b/, which matches one or more a's followed by b's. When applied to the string "aaab", it corresponds to the entire string.

Now let's check the "non-greedy" version of /a+?b/. One might think that it would match a b preceded by only one a. If applied to the same string, "aaab" would be expected to match the single character a and last character b. However, this pattern actually matches the entire string, just like the greedy version. The fact is that a regular expression pattern search is performed by finding the first position in the string, starting from which a match becomes possible. Since a match is possible starting from the first character of the string, shorter matches starting from subsequent characters are not even considered.

Alternatives, Grouping and Links

Regular expression grammar includes special characters for defining alternatives, grouping subexpressions, and references to previous subexpressions. Pipe symbol | serves to separate alternatives. For example, /ab|cd|ef/ matches either the string "ab", or the string "cd", or the string "ef", and the pattern /\d(3)|(4)/ matches either three digits or four lowercase letters .

Note that alternatives are processed from left to right until a match is found. If a match is found with the left alternative, the right one is ignored, even if a “better” match can be achieved. Therefore, when the pattern /a|ab/ is applied to the string "ab", it will only match the first character.

Parentheses have multiple meanings in regular expressions. One of them is to group individual elements into one subexpression, so that the elements when using the special characters |, *, +, ? and others are considered as one whole. For example, the pattern /java(script)?/ matches the word "java" followed by the optional word "script", and /(ab|cd)+|ef)/ matches either the string "ef" or one or more repetitions of one from the strings "ab" or "cd".

Another use of parentheses in regular expressions is to define subpatterns within a pattern. When a regular expression match is found in the target string, the portion of the target string that matches any specific subpattern enclosed in parentheses can be extracted.

Suppose you want to find one or more lowercase letters followed by one or more numbers. To do this, you can use the template /+\d+/. But let's also assume that we only want the numbers at the end of each match. If we put this part of the pattern in parentheses (/+(\d+)/), we can extract numbers from any matches we find. How this is done will be described below.

A related use of parenthetical subexpressions is to refer to subexpressions from a previous part of the same regular expression. This is achieved by specifying one or more digits after the \ character. The numbers refer to the position of the parenthesized subexpression within the regular expression. For example, \1 refers to the first subexpression, and \3 refers to the third. Note that subexpressions can be nested within each other, so the position of the left parenthesis is used when counting. For example, in the following regular expression, a nested subexpression (cript) reference would look like \2:

/(ava(cript)?)\sis\s(fun\w*)/

A reference to a previous subexpression does not point to the pattern of that subexpression, but to the text found that matches that pattern. Therefore, references can be used to impose a constraint that selects parts of a string that contain exactly the same characters. For example, the following regular expression matches zero or more characters within single or double quotes. However, it does not require that the opening and closing quotes match each other (that is, that both quotes be single or double):

/[""][^""]*[""]/

We can require quotation marks to match using a reference like this:

Here \1 matches the first subexpression. In this example, the link imposes a constraint that requires the closing quotation mark to match the opening quotation mark. This regular expression does not allow single quotes inside double quotes, and vice versa.

It is also possible to group elements in a regular expression without creating a numbered reference to those elements. Instead of simply grouping elements between ( and ), start the group with symbols (?: and end it with a symbol). Consider, for example, the following pattern:

/(ava(?:cript)?)\sis\s(fun\w*)/

Here the subexpression (?:cript) is only needed for grouping so that the repetition character ? can be applied to the group. These modified parentheses do not create a link, so in this regular expression, \2 refers to text that matches the pattern (fun\w*).

The following table lists the selection, grouping, and reference operators in regular expressions:

Regular expression symbols for selecting from alternatives, grouping, and JavaScript linksSymbol Meaning
| Alternative. Matches either the subexpression on the left or the subexpression on the right.
(...) Grouping. Groups elements into a single unit that can be used with the characters *, +, ?, | etc. Also remembers characters matching this group for use in subsequent references.
(?:...) Only grouping. Groups elements into a single unit, but does not remember the characters that correspond to that group.
\number Matches the same characters that were found when matching group number number. Groups are subexpressions inside (possibly nested) parentheses. Group numbers are assigned by counting left parentheses from left to right. Groups formed using the symbols (?:) are not numbered.
Specifying a Match Position

As described earlier, many elements of a regular expression match a single character in a string. For example, \s matches a single whitespace character. Other regular expression elements match the positions between characters rather than the characters themselves. For example, \b matches a word boundary—the boundary between \w (an ASCII text character) and \W (a non-text character), or the boundary between an ASCII text character and the beginning or end of a line.

Elements such as \b do not specify any characters that must be present in the matched string, but they do specify valid positions for matching. These elements are sometimes called regular expression anchor elements because they anchor the pattern to a specific position in the string. The most commonly used anchor elements are ^ and $, which link patterns to the beginning and end of a line, respectively.

For example, the word "JavaScript" on its own line can be found using the regular expression /^JavaScript$/. To find the single word "Java" (rather than a prefix like "JavaScript"), you can try using the pattern /\sJava\s/, which requires a space before and after the word.

But such a solution raises two problems. First, it will only find the word "Java" if it is surrounded by spaces on both sides, and will not be able to find it at the beginning or end of the line. Secondly, when this pattern does match, the string it returns will contain leading and trailing spaces, which is not exactly what we want. So instead of using a pattern that matches whitespace characters \s, we'll use a pattern (or anchor) that matches word boundaries \b. The result will be the following expression: /\bJava\b/.

The anchor element \B matches a position that is not a word boundary. That is, the pattern /\Bcript/ will match the words “JavaScript” and “postscript” and will not match the words “script” or “Scripting”.

Arbitrary regular expressions can also serve as anchor conditions. If you place an expression between the characters (?= and), it becomes a forward match test against subsequent characters, requiring that those characters match the specified pattern but not be included in the match string.

For example, to match the name of a common programming language followed by a colon, you can use the expression /ava(cript)?(?=\:)/. This pattern matches the word "JavaScript" in the string "JavaScript: The Definitive Guide", but it will not match the word "Java" in the string "Java in a Nutshell" because it is not followed by a colon.

If you enter the condition (?!), then this will be a negative forward check for subsequent characters, requiring that the following characters do not match the specified pattern. For example, the pattern /Java(?!Script)(\w*)/ matches the substring “Java”, followed by a capital letter and any number text ASCII characters provided that the substring "Java" is not followed by the substring "Script". It will match the string "JavaBeans" but not the string "Javanese", and it will match the string "JavaScrip" but not the strings "JavaScript" or "JavaScripter".

The table below provides a list of regular expression anchor characters:

Regular expression anchor characters Symbol Meaning
^ Matches the beginning of a string expression or the beginning of a line in a multiline search.
$ Matches the end of a string expression or the end of a line in a multiline search.
\b Matches a word boundary, i.e. matches the position between the \w character and the \W character, or between the \w character and the beginning or end of a line. (Note, however, that [\b] matches the backspace character.)
\B Matches a position that is not a word boundary.
(?=p) Positive lookahead check for subsequent characters. Requires subsequent characters to match the pattern p, but does not include those characters in the matched string.
(?!p) Negative forward check for subsequent characters. Requires that the following characters do not match the pattern p.
Flags

And one last element of regular expression grammar. Regular expression flags specify high-level pattern matching rules. Unlike the rest of regular expression grammar, flags are specified not between the slash characters, but after the second one. JavaScript supports three flags.

Flag i specifies that pattern matching should be case insensitive, and flag g- that the search should be global, i.e. all matches in the string must be found. Flag m performs a pattern search in multi-line mode. If the string expression being searched contains newlines, then in this mode the anchor characters ^ and $, in addition to matching the beginning and end of the entire string expression, also match the beginning and end of each text string. For example, the pattern /java$/im matches both “java” and “Java\nis fun”.

These flags can be combined in any combination. For example, to search for the first occurrence of the word "java" (or "Java", "JAVA", etc.) in a case-insensitive manner, you can use the case-insensitive regular expression /\bjava\b/i. And to find all occurrences of this word in a string, you can add the g flag: /\bjava\b/gi.

Methods class String to search by pattern

Up to this point, we've discussed the grammar of generated regular expressions, but we haven't looked at how those regular expressions can actually be used in JavaScript scripts. IN this section we will discuss methods String object, in which regular expressions are used for pattern matching and search with replacement. And then we'll continue our conversation about pattern matching with regular expressions by looking at the RegExp object and its methods and properties.

Strings supports four methods using regular expressions. The simplest of these is the search() method. It takes a regular expression as an argument and returns either the position of the first character of the matched substring, or -1 if no match is found. For example, the following call will return 4:

Var result = "JavaScript".search(/script/i); // 4

If the argument to the search() method is not a regular expression, it is first converted by passing it to the RegExp constructor. The search() method does not support global search and ignores the g flag in its argument.

The replace() method performs a search and replace operation. It takes a regular expression as its first argument and a replacement string as its second. The method searches the line on which it is called for a match to the specified pattern.

If the regular expression contains the g flag, the replace() method replaces all matches found with the replacement string. Otherwise, it replaces only the first match found. If the replace() method's first argument is a string rather than a regular expression, then the method performs a literal search for the string rather than converting it to a regular expression using the RegExp() constructor as the search() method does.

As an example, we can use the replace() method to capitalize the word "JavaScript" consistently across an entire line of text:

// Regardless of the case of characters, we replace them with a word in the required case var result = "javascript".replace(/JavaScript/ig, "JavaScript");

The replace() method is more powerful than this example would suggest. Let me remind you that the subexpressions in parentheses within a regular expression are numbered from left to right, and that the regular expression remembers the text corresponding to each of the subexpressions. If the replacement string contains a $ sign followed by a number, the replace() method replaces those two characters with the text that matches the specified subexpression. This is very useful opportunity. We can use it, for example, to replace straight quotes in a string with typographic quotes, which are simulated by ASCII characters:

// A quote is a quote followed by any number of characters // other than quotes (which we remember), followed by another quote // var quote = /"([^"]*)"/g; // Replace the straight quotes with typographic ones and leave "$1" unchanged // the contents of the quote stored in $1 var text = ""JavaScript" is an interpreted programming language."; var result = text.replace(quote, ""$1"") ; // "JavaScript" is an interpreted programming language.

An important thing to note is that the second argument to replace() can be a function that dynamically calculates the replacement string.

The match() method is the most common of the String class methods that uses regular expressions. It takes a regular expression as its only argument (or converts its argument to a regular expression by passing it to the RegExp() constructor) and returns an array containing the search results. If the g flag is set in the regular expression, the method returns an array of all matches present in the string. For example:

// will return ["1", "2", "3"] var result = "1 plus 2 equals 3".match(/\d+/g);

If the regular expression does not contain the g flag, the match() method does not perform a global search; it just looks for the first match. However, match() returns an array even when the method does not perform a global search. In this case, the first element of the array is the substring found, and all remaining elements are subexpressions of the regular expression. Therefore, if match() returns an array arr, then arr will contain the entire string found, arr the substring corresponding to the first subexpression, etc. Drawing a parallel with the replace() method, we can say that the contents of $n are entered into arr[n].

For example, take a look at the following program code which parses the URL:

Var url = /(\w+):\/\/([\w.]+)\/(\S*)/; var text = "Visit our website http://www..php"; var result = text.match(url); if (result != null) ( var fullurl = result; // Contains "http://www..php" var protocol = result; // Contains "http" var host = result; // Contains "www..php " )

It should be noted that for a regular expression that does not have the g flag set global search, the match() method returns the same value as the regular expression's exec() method: the returned array has index and input properties, as described in the discussion of the exec() method below.

The last method of the String object that uses regular expressions is split(). This method splits the string on which it is called into an array of substrings, using the argument as a delimiter. For example:

"123,456,789".split(","); // Return ["123","456","789"]

The split() method can also take a regular expression as an argument. This makes the method more powerful. For example, you can specify a separator that allows arbitrary number whitespace on both sides:

"1, 2, 3 , 4 , 5".split(/\s*,\s*/); // Return ["1","2","3","4","5"]

RegExp object

As mentioned, regular expressions are represented as RegExp objects. In addition to the RegExp() constructor, RegExp objects support three methods and several properties.

The RegExp() constructor takes one or two string arguments and creates a new RegExp object. The first argument to the constructor is a string containing the body of the regular expression, i.e. text that must appear between slash characters in a regular expression literal. Note that string literals and regular expressions use the \ character to represent escape sequences, so when passing the regular expression as a string literal to the RegExp() constructor, you must replace each \ character with a pair of \\ characters.

The second argument to RegExp() may be missing. If specified, it defines the regular expression flags. It must be one of the characters g, i, m or a combination of these characters. For example:

// Finds all five-digit numbers in a string. Note // the use of symbols in this example \\ var zipcode = new RegExp("\\d(5)", "g");

The RegExp() constructor is useful when the regular expression is generated dynamically and therefore cannot be represented using regular expression literal syntax. For example, to find a string entered by the user, you need to create a regular expression at runtime using RegExp().

RegExp Properties

Each RegExp object has five properties. The source property is a read-only string containing the text of the regular expression. The global property is a read-only boolean value that determines whether the g flag is present in the regular expression. The ignoreCase property is a read-only boolean value that determines whether the i flag is present in the regular expression. The multiline property is a read-only boolean value that determines whether the m flag is present in the regular expression. And the last property, lastIndex, is an integer that can be read and written. For patterns with the g flag, this property contains the position number in the line at which the next search should begin. As described below, it is used by the exec() and test() methods.

RegExp Methods

RegExp objects define two methods that perform pattern matching; they behave similarly to the String class methods described above. The main method of the RegExp class used for pattern matching is exec() . It is similar to the String class match() method mentioned above, except that it is a RegExp class method that takes a string as an argument, rather than a String class method that takes a RegExp argument.

The exec() method executes the regular expression for the specified string, i.e. looks for a match in a string. If no match is found, the method returns null. However, if a match is found, it returns the same array as the array returned by the match() method for searching without the g flag. The zero element of the array contains the string that matches the regular expression, and all subsequent elements contain substrings that match all subexpressions. In addition, the property index contains the position number of the character with which the corresponding fragment begins, and the property input refers to the line that was searched.

Unlike match(), the exec() method returns an array whose structure does not depend on the presence of the g flag in the regular expression. Let me remind you that when passing a global regular expression, the match() method returns an array of matches found. And exec() always returns one match, but provides complete information about it. When exec() is called on a regular expression that contains the g flag, the method sets the lastIndex property of the regular expression object to the position number of the character immediately following the found substring.

When exec() is called a second time on the same regular expression, it begins the search at the character whose position is specified in the lastIndex property. If exec() does not find a match, the lastIndex property is set to 0. (You can also set lastIndex to zero at any time, which should be done in all cases where the search ends before the last match in a single row is found, and the search begins on another line with the same RegExp object.) This special behavior allows exec() to be called repeatedly to iterate over all regular expression matches in the line. For example:

Var pattern = /Java/g; var text = "JavaScript is more fun than Java!"; var result; while((result = pattern.exec(text)) != null) ( console.log("Found "" + result + """ + " at position " + result.index + "; next search will start at " + pattern .lastIndex);

Another method of the RegExp object is test() , which is much more simpler method exec(). It takes a string and returns true if the string matches the regular expression:

Var pattern = /java/i; pattern.test("JavaScript"); // Return true

Calling test() is equivalent to calling exec(), which returns true if exec() returns something other than null. For this reason, the test() method behaves in the same way as the exec() method when called on a global regular expression: it begins searching for the specified string at the position specified by the lastIndex property, and if it finds a match, sets the lastIndex property to the character position number directly next to the found match. Therefore, using the test() method, you can create a line traversal loop in the same way as using the exec() method.