Formal language
In mathematics, logic and computer science, a formal language is a set of finite-length words (i.e. character strings) drawn from some finite alphabet. Note that we can talk about formal language in many contexts (scientific, legal and so on), meaning a mode of expression more careful and accurate than everyday speech. Use of a particular formal language in the sense intended here is an 'ultimate' version of that usage: formal enough to be used in written form for automatic computation, is a possible criterion.A typical alphabet would be {a, b}, and a typical string over that alphabet would be
- ababba.
The empty word (that is, length-zero string) is allowed and is often denoted by e, ε or λ. While the alphabet is a finite set and every string has finite length, a language may very well have infinitely many member strings (because the length of words in it may be unbounded).
Some examples of formal languages:
- the set of all words over {a, b};
- the set { an : n is a prime number };
- the set of syntactically correct programs in a given programming language; or
- the set of inputs upon which a certain Turing machine halts.
- Strings produced by some formal grammar (see Chomsky hierarchy);
- Strings produced by a regular expression;
- Strings accepted by some automaton, such as a Turing machine or finite state automaton;
- From a set of related YES/NO questions those ones for which the answer is YES — see decision problem.
- The concatenation L1L2 consists of all strings of the form vw where v is a string from L1 and w is a string from L2.
- The intersection of L1 and L2 consists of all strings which are contained in L1 and also in L2.
- The union of L1 and L2 consists of all strings which are contained in L1 or in L2.
- The complement of the language L1 consists of all strings over the alphabet which are not contained in L1.
- The right quotient L1/L2 of L1 by L2 consists of all strings v for which there exists a string w in L2 such that vw is in L1.
- The Kleene star L1* consists of all strings which can be written in the form w1w2...wn with strings wi in L1 and n ≥ 0. Note that this includes the empty string ε because n = 0 is allowed.
- The reverse L1R contains the reversed versions of all the strings in L1.
- The shuffle of L1 and L2 consists of all strings which can be written in the form v1w1v2w2...vnwn where n ≥ 1 and v1,...,vn are strings such that the concatenation v1...vn is in L1 and w1,...,wn are strings such that w1...wn is in L2.