Overview of the mapclassify API¶
There are a number of ways to access the functionality in mapclassify
We first load the example dataset that we have seen earlier.
[1]:
from libpysal import examples
import geopandas as gpd
from mapclassify import classify
[2]:
pth = examples.get_path('columbus.shp')
gdf = gpd.read_file(pth)
y = gdf.HOVAL
gdf.head()
[2]:
AREA | PERIMETER | COLUMBUS_ | COLUMBUS_I | POLYID | NEIG | HOVAL | INC | CRIME | OPEN | ... | DISCBD | X | Y | NSA | NSB | EW | CP | THOUS | NEIGNO | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.309441 | 2.440629 | 2 | 5 | 1 | 5 | 80.467003 | 19.531 | 15.725980 | 2.850747 | ... | 5.03 | 38.799999 | 44.070000 | 1.0 | 1.0 | 1.0 | 0.0 | 1000.0 | 1005.0 | POLYGON ((8.62413 14.23698, 8.55970 14.74245, ... |
1 | 0.259329 | 2.236939 | 3 | 1 | 2 | 1 | 44.567001 | 21.232 | 18.801754 | 5.296720 | ... | 4.27 | 35.619999 | 42.380001 | 1.0 | 1.0 | 0.0 | 0.0 | 1000.0 | 1001.0 | POLYGON ((8.25279 14.23694, 8.28276 14.22994, ... |
2 | 0.192468 | 2.187547 | 4 | 6 | 3 | 6 | 26.350000 | 15.956 | 30.626781 | 4.534649 | ... | 3.89 | 39.820000 | 41.180000 | 1.0 | 1.0 | 1.0 | 0.0 | 1000.0 | 1006.0 | POLYGON ((8.65331 14.00809, 8.81814 14.00205, ... |
3 | 0.083841 | 1.427635 | 5 | 2 | 4 | 2 | 33.200001 | 4.477 | 32.387760 | 0.394427 | ... | 3.70 | 36.500000 | 40.520000 | 1.0 | 1.0 | 0.0 | 0.0 | 1000.0 | 1002.0 | POLYGON ((8.45950 13.82035, 8.47341 13.83227, ... |
4 | 0.488888 | 2.997133 | 6 | 7 | 5 | 7 | 23.225000 | 11.252 | 50.731510 | 0.405664 | ... | 2.83 | 40.009998 | 38.000000 | 1.0 | 1.0 | 1.0 | 0.0 | 1000.0 | 1007.0 | POLYGON ((8.68527 13.63952, 8.67758 13.72221, ... |
5 rows × 21 columns
Original API (< 2.4.0)¶
[3]:
import mapclassify
bp = mapclassify.BoxPlot(y)
bp
[3]:
BoxPlot
Interval Count
----------------------
( -inf, -0.70] | 0
(-0.70, 25.70] | 13
(25.70, 33.50] | 12
(33.50, 43.30] | 12
(43.30, 69.70] | 7
(69.70, 96.40] | 5
Extended API (>= 2.40)¶
Note the original API is still available so this extension keeps backwards compatibility.
[4]:
bp = classify(y, 'box_plot')
bp
[4]:
BoxPlot
Interval Count
----------------------
( -inf, -0.70] | 0
(-0.70, 25.70] | 13
(25.70, 33.50] | 12
(33.50, 43.30] | 12
(43.30, 69.70] | 7
(69.70, 96.40] | 5
[5]:
type(bp)
[5]:
mapclassify.classifiers.BoxPlot
[6]:
q5 = classify(y, 'quantiles', k=5)
q5
[6]:
Quantiles
Interval Count
----------------------
[17.90, 23.08] | 10
(23.08, 30.48] | 10
(30.48, 39.10] | 9
(39.10, 45.83] | 10
(45.83, 96.40] | 10
Robustness of the scheme
argument¶
[7]:
classify(y, 'boxPlot')
[7]:
BoxPlot
Interval Count
----------------------
( -inf, -0.70] | 0
(-0.70, 25.70] | 13
(25.70, 33.50] | 12
(33.50, 43.30] | 12
(43.30, 69.70] | 7
(69.70, 96.40] | 5
[8]:
classify(y, 'Boxplot')
[8]:
BoxPlot
Interval Count
----------------------
( -inf, -0.70] | 0
(-0.70, 25.70] | 13
(25.70, 33.50] | 12
(33.50, 43.30] | 12
(43.30, 69.70] | 7
(69.70, 96.40] | 5
[9]:
classify(y, 'Box_plot')
[9]:
BoxPlot
Interval Count
----------------------
( -inf, -0.70] | 0
(-0.70, 25.70] | 13
(25.70, 33.50] | 12
(33.50, 43.30] | 12
(43.30, 69.70] | 7
(69.70, 96.40] | 5
[10]:
classify?
Signature:
classify(
y,
scheme,
k=5,
pct=[1, 10, 50, 90, 99, 100],
pct_sampled=0.1,
truncate=True,
hinge=1.5,
multiples=[-2, -1, 1, 2],
mindiff=0,
initial=100,
bins=None,
)
Docstring:
Classify your data with `mapclassify.classify`
Note: Input parameters are dependent on classifier used.
Parameters
----------
y : array
(n,1), values to classify
scheme : str
pysal.mapclassify classification scheme
k : int, optional
The number of classes. Default=5.
pct : array, optional
Percentiles used for classification with `percentiles`.
Default=[1,10,50,90,99,100]
pct_sampled : float, optional
The percentage of n that should form the sample
(JenksCaspallSampled, FisherJenksSampled)
If pct is specified such that n*pct > 1000, then pct = 1000./n
truncate : boolean, optional
truncate pct_sampled in cases where pct * n > 1000., (Default True)
hinge : float, optional
Multiplier for IQR when `BoxPlot` classifier used.
Default=1.5.
multiples : array, optional
The multiples of the standard deviation to add/subtract from
the sample mean to define the bins using `std_mean`.
Default=[-2,-1,1,2].
mindiff : float, optional
The minimum difference between class breaks
if using `maximum_breaks` classifier. Deafult =0.
initial : int
Number of initial solutions to generate or number of runs
when using `natural_breaks` or `max_p_classifier`.
Default =100.
Note: setting initial to 0 will result in the quickest
calculation of bins.
bins : array, optional
(k,1), upper bounds of classes (have to be monotically
increasing) if using `user_defined` classifier.
Default =None, Example =[20, max(y)].
Returns
-------
classifier : pysal.mapclassify.classifier instance
Object containing bin ids for each observation (.yb),
upper bounds of each class (.bins), number of classes (.k)
and number of observations falling in each class (.counts)
Note: Supported classifiers include: quantiles, box_plot, euqal_interval,
fisher_jenks, headtail_breaks, jenks_caspall, jenks_caspall_forced,
max_p_classifier, maximum_breaks, natural_breaks, percentiles, std_mean,
user_defined
Examples
--------
Imports
>>> from libpysal import examples
>>> import geopandas as gpd
>>> from mapclassify import classify
Load Example Data
>>> link_to_data = examples.get_path('columbus.shp')
>>> gdf = gpd.read_file(link_to_data)
>>> x = gdf['HOVAL'].values
Classify values by quantiles
>>> quantiles = classify(x, 'quantiles')
Classify values by box_plot and set hinge to 2
>>> box_plot = classify(x, 'box_plot', hinge=2)
File: ~/Dropbox/p/pysal/src/subpackages/mapclassify/mapclassify/_classify_API.py
Type: function
[ ]: