Files: use71.m
Back to main

QuickCheck Tutorial 7

Generators - General Frequency

General Frequency follows the ideas in Specific Frequency. You must understand the format of Specific Frequency in order to write General Frequency. However the amount of work in GF is usually less then in SP. In GF, the user should specify one level down in branches, where a practicl SP may be 3+ level down.

Back in Tutorial 3, an invariant function was written to test the law:

         reverse (reverse xs) = xs
:- func testing4(list(float)) = property.
testing4(Xs) = 
         list_length(Xs) `>>>` (nrev(nrev(Xs)) `===` Xs).

Test Description : testing4
Number of test cases that succeeded : 100
Number of trivial tests : 0
Number of tests cases which failed the pre-condition : 0
Distributions of selected argument(s) : 
1     8
1     4
1     6
2     5
8     3
16     2
18     1
53     0
The display of testing4 shows that
        53 cases of length == 0
        18 cases of length == 1
        16 cases of length == 2
        ...etc...       
The length of the list is heavily biased towards the short end. In the limit with even distribution along the branches, the list will have :
50%     length == 0
25%     length == 1
12.5%   length == 2
6.25%   length == 3
3.125%  length == 4
...etc... halving the probability in each step.
        :- type list(T) 
                --->    [] 
                ;       [T | list(T)].
At any given length accumulated so far, there is a 50-50 chance of either stopping there (if the first branch is chosen) or going on to a longer list (if the second branch is chosen).

The 5th argument of qcheck/7 takes General Frequency, it's of type list({type_desc, list(frequency)}). Each element of the list contains the General Frequency for a type. The list length can be increased if the 2nd branch if favoured over the 1st branch.

From Tute6: frequency defines the relative chance of a branch being selected, plus information about that branch's sub-branches. list(frequency) contains distribution information about 1 discriminated union, i.e. the list should contain frequencies for all possible branches. list(list(frequency)) contains distribution information about a list of discriminated unions.

A list is a discriminated union. To describe it, the correct format will be list(frequency), which matches the 2nd term of list({type_desc, list(frequency)})

The list(frequency) for list(T) should have two elements, since the definition of list/1 has two branches.

        Output = [ frequency_a, frequency_b ]

        frequency_a = { 10, ...something_a... }
        frequency_b = { 90, ...something_b... }
I.e. 10% take the 1st branch, 90% take the 2nd branch.

The constrctor []/0 takes no argument, thus something_a is [], ie: frequency_a = { 10, [] }

The 2nd branch constructor .(T, list(T)) takes 2 arguments. In specific frequency that would mean a list of two elements (or choose defalut []) :

        something_b = list(list(frequency)) = [ A, B ]
In General Frequency the user can specify down more than 1 level, however it is not required in this case. Define something_b = { 90, [] }.

Put it all together:

                list(frequency)
        =       [ frequency_a, frequency_b ]
        =       [ { 10, [] } , { 90, [] }  ]

Then
                { type_desc, list(frequency) }
        =       { type_of([0.0]), [ { 10, [] } , { 90, [] }  ] }

Now for type list(float), there is 10% chance of selecting 1st branch, and 90% chance of selecting 2nd branch at EVERY level.

The complete code (use71.m) :
:- module use71.

:- interface.

:- use_module io.

:- pred main(io__state, io__state).
:- mode main(di, uo) is det.

%---------------------------------------------------------------------------%

:- implementation.

:- import_module int, float, list, std_util.
:- import_module qcheck, nrev.

%---------------------------------------------------------------------------%

main -->
        { freq_list(F) },       
        qcheck(qcheck__f(testing4), "testing4", 1000, [], [F]).

:- pred freq_list({type_desc, list(frequency)}).
:- mode freq_list(out) is det.
freq_list(F) :-
        F = { type_of([0.0]), [ { 10, [] } , { 90, [] }  ] }.

:- func testing4(list(float)) = property.
testing4(Xs) = 
        list_length(Xs) `>>>` (nrev(nrev(Xs)) `===` Xs).
A sample output shows the lists are much longer :

Test Description : testing4
Number of test cases that succeeded : 1000
Number of trivial tests : 0
Number of tests cases which failed the pre-condition : 0
Distributions of selected argument(s) : 
1     39
1     56
2     40
2     36
2     38
2     46
2     61
2     28
2     50
2     33
2     53
2     41
3     43
3     32
4     34
4     37
4     31
5     26
5     30
7     21
7     35
8     25
8     29
9     22
9     23
11     19
14     24
14     17
15     27
16     18
16     20
18     14
19     16
23     13
25     15
30     11
32     12
34     9
42     10
43     8
49     7
49     6
62     4
63     3
69     5
72     2
91     1
95     0

Summary on default frequency, specific frequency, general frequency :

  1. Before Quickcheck generates a term, it considers the term's type. If it's not a discriminated union, all the frequency input is ignored and lost forever.

  2. If it is a discriminated union, then Quickcheck generates the term according to the specific frequency. In cases where the specific frequency is [], then Quickcheck will search the general frequency list to find the matching type.

  3. If no matching type is found, then generate the term (at this level) evenly

            eg:     :- type coin 
                            --->    small(face)
                            ;       large(color).
                    :- type face
                            --->    head
                            ;       tail.
    
    The chance between small/large is even, but that doesn't mean chance between face:head/tail is even.
  4. If matching type is found, then Quickcheck copies that frequency information, and treats that as the specific frequency.

In the list(float) example:

        qcheck(qcheck__f(testing4), "testing4", 1000, [], [F])
        F = { type_of([0.0]), [ { 10, [] } , { 90, [] }  ] }
Quickcheck will first find [] as specific frequency (since [] is passed to qcheck), so it will look in the general frequency list for the information on how to generate list(float). That information will be extracted. The function which generates discriminated union will behave as if it was called with specific frequency equal to that of the information extracted. The information in GF is only 1 level deep, it can be used only once, after that the specific frequency will be [] again. So if the 2nd branch is chosen and another list(float) is needed, then quickcheck will find [] as specific frequency, then it will find general frequency list contains information on how to generate list(float). That information will be copied over...etc... That is:
        0       enter generator
        1       SF = [], 
        2       search GF, found, [ { 10, [] } , { 90, [] }  ]
        3       restart the generator with SF =  [ { 10, [] } , { 90, [] }  ]
        4       SF = [ { 10, [] } , { 90, [] }  ], do not search GF
        5       suppose 2nd branch is chosen, ie { 90, [] }
                the sub-branch has SF = [] 
                ( If the 1st branch is chosen, then stop the looping. )
        6       generate the sub-branch with SF = []

        7       enter generator (for the sub-branch), SF = [], go back to step 1

The actual code does not restart the generator.