Australian Credit Approval

1. TITLE:
      Australian Credit Approval

2. USE IN STATLOG

      2.1- Testing Mode
            10-Fold Cross Validation

      2.2- Special Preprocessing
            Yes (See REMARKS)

      2.3- Test Results
                        Error Rate TIME
            Algorithm   Train Test Train Test
            ---------   ----------------------------
            Cal5        0.132 0.131 24    2
            Itrule            0.162 0.137 174   1
            LogDisc           0.125 0.141 21    18
            Discrim           0.139 0.141 32    7
            Dipol92           0.139 0.141 56    2
            Radial            0.107 0.145 12    2
            Cart        0.145 0.145 68    2
            Castle            0.144 0.148 47    5
            Bayes       0.136 0.151 4     0
            IndCart           0.081 0.152 34    33
            BackProp    0.087 0.154 1370 0
            C4.5        0.099 0.155 6     1
            Smart       0.090 0.158 246   0
            BayTree           0     0.171 7     0
            KNN         0     0.181 3     7
            Ac2         0     0.181 400   14
            NewId       0     0.181 15    0
            LVQ         0.065 0.197 261   7
            Alloc80           0.194 0.201 877   ?
            Cn2         0.001 0.204 42    3
            QuaDisc           0.185 0.207 31    7
            Default           0.440 0.440 ?     ?
            Cascade           1.000
            Kohonen           1.000

3. SOURCES and PAST USAGE

      3.1 ORIGINAL SOURCE
            (confidential)
            Submitted by quinlan@cs.su.oz.au

      3.2 PAST USAGE
         See Quinlan,
         * "Simplifying decision trees", Int J Man-Machine Studies 27,
               Dec 1987, pp. 221-234.
           * "C4.5: Programs for Machine Learning", Morgan Kaufmann, Oct 1992

      3.2. RELEVANT INFORMATION

    This file concerns credit card applications. All attribute names
    and values have been changed to meaningless symbols to protect
    confidentiality of the data.

    This dataset is interesting because there is a good mix of
    attributes -- continuous, nominal with small numbers of
    values, and nominal with larger numbers of values. There
    were originally a few missing values, but these have all
    been replaced by the overall median.

4. DATASET DESCRIPTION

      NUMBER OF EXAMPLES
            Total no. = 690
      NUMBER OF CLASSES: 2
          0,1 (-,+)

          Class Distribution:
            +: 307 (44.5%)    CLASS 1
            -: 383 (55.5%)    CLASS 0

      NUMBER OF ATTRIBUTES
            14 (6 Continuous 8 Categorical)

    A1:   0,1    CATEGORICAL
            a,b
    A2:   continuous.
    A3:   continuous.
    A4:   1,2,3         CATEGORICAL
            p,g,gg
    A5:   1, 2,3,4,5, 6,7,8,9,10,11,12,13,14    CATEGORICAL
           ff,d,i,k,j,aa,m,c,w, e, q, r,cc, x

    A6:   1, 2,3, 4,5,6,7,8,9    CATEGORICAL
            ff,dd,j,bb,v,n,o,h,z

    A7:   continuous.
    A8:   1, 0       CATEGORICAL
            t, f.
    A9: 1, 0     CATEGORICAL
            t, f.
    A10: continuous.
    A11:       1, 0     CATEGORICAL
          t, f.
    A12:    1, 2, 3    CATEGORICAL
                  s, g, p
    A13: continuous.
    A14: continuous.

5-REMARKS:

      Missing Attribute Values:
            37 cases (5%) HAD one or more missing values. The missing
            values from particular attributes WERE:

            A1: 12
            A2: 12
            A4:   6
            A5:   6
            A6:   9
            A7:   9
            A14: 13

            THESE WERE REPLACED BY THE MODE OF THE ATTRIBUTE (CATEGORICAL)
                               MEAN OF THE ATTRIBUTE (CONTINUOUS)

      There is no cost matrix.

_____________________________________________________________________________

Three remarks relating to the StatLog version:

        THE LABELS HAVE BEEN CHANGED FOR THE CONVENIENCE OF THE STATISTICAL
      ALGORITHMS.   FOR EXAMPLE, ATTRIBUTE 4 ORIGINALLY HAD 3 LABELS p,g,gg
      AND THESE HAVE BEEN CHANGED TO LABELS 1,2,3.

1. Attributes 4 and 5 of the original WERE APPARENTLY IDENTICAL,
    so ATTRIBUTE 4 OF THE ORIGINAL WAS REMOVED
    (for the convenience of the statistical algorithms).

2. Where attributes were categorical, the categories were given numerical
    labels in the order of the relative risk of being class "+". Treat
    as categorical if thought desirable.   All StatLog trials treated
    these variables as numerical.

3. A stepwise regression procedure strongly suggests that only
    attributes A5, A8, A9, A13 and A14 are relevant.   Improved results
    are often obtained if only these five attributes are used.

CONTACTS
      statlog-adm@ncc.up.pt
      bob@stams.strathclyde.ac.uk

      See README file for general information

================================================================================

Tips Motivasi

Australian Credit Approval

Post a Comment

Artikel Terkait Tips Motivasi

Popular Posts

Terbaru

Australian Credit Approval

Lagi Hangat dari Tips Motivasi

Post a Comment

Artikel Terkait Tips Motivasi

Popular Posts

Terbaru