{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# その他の回帰手法\n", "\n", "
\n", " このページのオリジナルのipynbファイル\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ニューラルネット回帰\n", "\n", "一つの説明変数、一つの目的変数の非線形フィッティングにニューラルネット回帰を使うことはあまりないだろうが、機械学習の手法を回帰に使う事例として、これまで取り上げてきたデータ例をもとに試してみよう。\n", "\n", "(カーネル法をいったん忘れて)基底関数の線形結合で目的変数を表す:\n", "$$\n", " y = \\sum_{j=1}^D w_j \\phi_j (\\boldsymbol{x}) \\tag{1}\n", "$$\n", "\n", "これまで、訓練(トレーニング)データから機械学習の幾つかの方法を使って、係数$\\{w_j\\}$を決定することを行ってきた。\n", "\n", "ニューラルネットの回帰手法は、これまでのフィッティングのイメージを引きついで説明すると以下のようになる。\n", "\n", "- (1)式の出力を、複数個考える:\n", " $$\n", " a_i = \\sum_{j=1}^D w_j^{(1)} \\phi_j (\\boldsymbol{x}) \\tag{1}\n", " $$\n", " これが隠れユニット(神経細胞に相当)$i$への入力となる。\n", "\n", "- 隠れユニットは、その入力値がある一定値を超えると発火すると考える。これがニューロンのイメージを引き継いだ考えである。\n", "\n", "- 隠れユニットの$i$番目のノードの「発火」は重み$w_{ji}$をかけて次の層の$j$番目のノードに伝えられる。\n", "\n", "単純のために隠れユニット層を1層とすると、\n", "\n", "$$\n", " y = \\sigma \\left(\n", " \\sum_{j=1}^M w_{j}^{(2)} h\\left( \\sum_{j=1}^D w_j^{(1)} \\phi_j (\\boldsymbol{x}) \\right)\n", " \\right)\n", "$$\n", " \n", "関数$\\sigma$, $h$は、外から与えるものであり、階段関数、線形関数、ロジスティック関数などが用いられる。\n", "\n", "PRMLからの図を引用する(図5.1) この図では出力は$K$個書かれているが、上の式では、出力(目的変数)一つだけにしたことに注意。\n", "\n", "\n", "\n", "\n", "\n", "学習(トレーニング)はデータを通じて、入力(説明変数)に対する出力が目的変数の観測地ともっともよく一致するようノード間の結合係数$w_{ij}^{k}$を決めていくことである。\n", "\n", "人間の脳も、経験を通じてシナプス結合の強さが形成されていくと考えられており、それをまねたものである。神経細胞の興奮(発火)は、入力がある閾値を超えたときにおこると考えられており、それに類似する$h(x)$は階段関数であるが、回帰分析にはもっと滑らかな関数を用いることが多い。scikit-learnでは、線形あるいは負の値に対してはゼロとするrectified linear unit function $h(a_i) = {\\rm max}(0, x)$、およびロジスティック関数用意されている。\n", "\n", "どのように$w_{ij}^{k}$を決めるかがアルゴリズム上の本題であるが、ここでは省略する。\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## scikit-learnのニューラルネット回帰 (Multi-layer Perceptron Regression)\n", "\n", "scikit-learnにはニューラルネット回帰のモジュールMLPRegressorが提供されている。(MLPはMulti-layer Perceptronの略)\n", "\n", "https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html\n", "\n", "今まで使ってきた$\\sin$関数に乱数を加えたデータと多項式(べき)基底関数を使って、MLPregressorを試してみよう。\n", "\n", "多項式近似の時と同様、入力として、ある変数$x$のべき乗$\\{x^0, x, x^2, \\cdots , x^D\\}$を入力として、出力(1個)を説明変数$y$と比べるという訓練を行い、ニューロン結合係数$w_{ij}^{(k)}$を決定する。\n", "\n", "(注)\n", "- 基底関数をセットしなければならないところが、前回みたSVMと異なることに注意しよう。(SVMでは、基底関数の代わりにカーネルをセットする。通常、既定のガウシアンカーネル(rbf: radial basis function)を選べば問題ない。)" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [], "source": [ "# データ作成用関数\n", "import random\n", "def make_data_by_sin_gaussian(x, randomness=0.2):\n", " # y=sin (x)を計算し、ガウス分布に従うノイズを加える\n", " y = np.sin(x)\n", " e = [random.gauss(0, randomness) for i in range(len(y))]\n", " # e = np.random.randn(len(x))*0.2\n", " y += e\n", " return y\n" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.neural_network import MLPRegressor\n", "\n", "x_max = 6.5 # 予測の範囲の上限 (データは[0,2π]の範囲のみ)\n", "\n", "# トレーニングデータ作成\n", "n_tr = 50\n", "x = np.linspace(0., np.pi*2., n_tr) # リスト 0から2πまでをn_tr等分した値を一次元配列\n", "y = make_data_by_sin_gaussian(x, 0.3)\n", "\n", "# 基底関数をM次の多項式とする\n", "deg = 5\n", "X = np.vander(x, deg+1) # 計画行列の作成\n", "# X = x[:, np.newaxis]\n", " \n", "'''\n", "学習を行う (ここでの多項式近似では、1層で十分だが、\n", "結合係数を決めるアルゴリズム(誤差逆伝播法)上、\n", "ニューロン数(layer_size)を少なくとも2以上程度にする必要があるようだ。\n", " 3以下ではかなり不安定\n", "第1引数はhidden_layer_sizedをtupleで与える\n", "既定値は(100,)\n", "solverはlbfgs(quasi-Newton methods)にする。既定はadamだが、データ数が少ない場合はlbfgsがよい。\n", "'''\n", "mlp = MLPRegressor((5,3), activation=\"identity\", solver=\"lbfgs\", max_iter=2000)\n", "mlp.fit(X,y)\n", "\n", "# 予測\n", "x_test = np.linspace(0, x_max, 100)\n", "y_test = mlp.predict(np.vander(x_test, deg+1))\n", " \n", "#プロット\n", "plt.scatter(x, y)\n", "plt.plot(x_test, y_test)\n", "\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ランダムフォレストによる回帰\n", "\n", "\n", "訓練データをランダムサンプリングによりいくつかのサブデータを作り、説明変数と目的変数の関係を木構造に分類し、得られた木の平均をとる手法を、ランダムフォレスト(random forest)回帰という。https://ja.wikipedia.org/wiki/%E3%83%A9%E3%83%B3%E3%83%80%E3%83%A0%E3%83%95%E3%82%A9%E3%83%AC%E3%82%B9%E3%83%88\n", "\n", "\n", "ランダムフォレストは、多数の説明変数がある場合の分類(例えば画像認識)に用いることが多いが、ここでは、上のデータ、つまり多項式のそれぞれの値を独立な説明変数としてランダムフォレスト回帰のモジュールを用いて評価してみよう。\n", "\n", "変数の組$\\{x^0, x, x^2, \\cdots , x^D\\}$を、目的変数を参照して、カテゴライズしようというのである。\n", "\n", "パラメータはたくさんあるが、ここでは既定値での評価だけを示す。\n", "\n", "Web検索を行うと、ボストンデータを用いた予測例などがたくさんあるので参考にしてほしい。\n", "\n", "以下では、scikit-learnのRandomForestRegressorを用いてみる。\n", "\n", "https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html\n" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAi8AAAGdCAYAAADaPpOnAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA/QUlEQVR4nO3de3yU5Z338e/M5DAEkglJSDJACOEgGINyEg1Y8VARtNRal+p2tbo97EK1Hlif7do+LdKtsrvPbrdrW2nR1urDtrp9rAcqgrQqHkDDwSgBRMBAAiQEEjJJCEnIzP38kcxAzDmZe+7cM5/36zWvVzO5555fYkq+ua7fdV0OwzAMAQAA2ITT6gIAAAD6g/ACAABshfACAABshfACAABshfACAABshfACAABshfACAABshfACAABsJc7qAsItEAjo2LFjSk5OlsPhsLocAADQB4ZhqL6+XqNHj5bT2fPYStSFl2PHjiknJ8fqMgAAwACUl5dr7NixPV4TdeElOTlZUtsXn5KSYnE1AACgL+rq6pSTkxP6Pd6TqAsvwamilJQUwgsAADbTl5YPGnYBAICtEF4AAICtEF4AAICtEF4AAICtEF4AAICtEF4AAICtEF4AAICtEF4AAICtRN0mdbAPf8BQUWmNquqblJns1py8NLmcnEcFAOgZ4QWW2FBSoZXr9qjC1xR6zutxa8XifC0s8FpYGQBgqGPaCBG3oaRCy9bu7BBcJKnS16Rla3dqQ0mFRZUBAOyA8IKI8gcMrVy3R0YXnws+t3LdHvkDXV0BAADhBRFWVFrTacTlfIakCl+TikprIlcUAMBWCC+IqKr67oPLQK4DAMQewgsiKjPZHdbrAACxh/CCiJqTlyavx63uFkQ71LbqaE5eWiTLAgDYCOEFEeVyOrRicb4kdQowwY9XLM5nvxcAQLcIL4i4hQVerb59prI9HaeGsj1urb59Jvu8AAB6xCZ1sMTCAq+uy89mh10AQL8RXmAZl9OhwonpVpcBALAZpo0AAICtEF4AAICtEF4AAICtEF4AAICtEF4AAICtEF4AAICtsFQa6AN/wGBPGgAYIggvQC82lFRo5bo9qvCdO+na63FrxeJ8dgMGAAswbQT0YENJhZat3dkhuEhSpa9Jy9bu1IaSCosqA4DYRXgBuuEPGFq5bo+MLj4XfG7luj3yB7q6AgBgFsIL0I2i0ppOIy7nMyRV+JpUVFoTuaIAAIQXoDtV9d0Hl4FcBwAID8IL0I3MZHdYrwMAhAfhBejGnLw0eT1udbcg2qG2VUdz8tIiWRYAxDyWSsNUZ1r82vxJlc6c9Q/6Xi6nU1dOzlBqUkIYKuvL+zm0YnG+lq3dKYfUoXE3GGhWLM5nvxcAiDDCC0z1+JsH9LPXD4Ttfp+/MEtP3jk7bPfrzcICr1bfPrPTPi/Z7PMCAJYhvMBUh6sbJUkTRg3XmNRhA76PYUjvHDip1z8+rgrfGXk9A79Xfy0s8Oq6/Gx22AWAIYLwAlPVNZ2VJC2dP1FfmZ0zqHt95ZdbVXSoRn/ceVR3Xz0pHOX1mcvpUOHE9Ii+JwCgazTswlT1Ta2SpBT34HPyVy5tCz//s71cATaGA4CYRXiBqerbR15S3PGDvtcN07I1IjFOh6sbVXSIjeEAIFYRXmCqujNtIy/JYQgvSQlxWnxJW4Ps/2wrH/T9AAD2RHiBqYIjL8lhmDaSpCXtfTPrSypC/TQAgNhCeIFpWv0BnW5p298lZdjgR14kaUZOqiZnjlDT2YDWfXgsLPcEANgL4QWmaWhuDf3vcI28OByO0Kql/9l+JCz3BADYC+EFpgmuNHLHOxXvCt+P2s0zxyjO6dCH5bXaV1kftvsCAOyB8ALT1IVxpdH5MkYk6toLMyW1LZsGAMQWNqmDac6tNAr/j9lXZudo4+7jen7nEQ2Ld4X9/naVGOfUbXPGaVRyotWlAIBpCC8wzbmVRuEdeZGk+ReMUlZKoo7XNevnb4Tv7KRocLT2jP7lloutLgMATEN4gWmCPS9mjLzEuZx6/G9m6k8fVchgs11JUvXpFq378Jje2FclwzDkcHD2EoDoRHiBaUI9L2FaJv1Zs3LTNCs3zZR721HTWb827anU8bpmfXK8QVOyk60uCQBMQcMuTBPOc43QO3e8S5dPaDs8cvMnVRZXAwDmIbzANGb2vKBr8y8YJUna/MkJiysBAPMQXmCa4GojRl4i58r28LKt9JQaW1p7uRoA7InwAtPUNzPyEmkTMoZr7MhhavEH9N6n1VaXAwCmILzANGauNkLXHA7HuamjfUwdAYhOhBeYpu6MOTvs9sYfMLT1YLVeKj6qrQer5Q8MzbXUZtUZnDp6a//JsNwPAIYa/iSGaawYedlQUqGV6/aowtcUes7rcWvF4nwtLPBGrI7emFnn3InpinM6VHrytA5Xn1Zu+vDBlgsAQwojLzBNXSi8RGbkZUNJhZat3dkhEEhSpa9Jy9bu1IaSiojU0Ruz60x2x2tW7khJ0lusOgIQhQgvMM25TerMH3nxBwytXLdHXU28BJ9buW6P5VNIkarzytCSaaaOAEQfwgtM0dzqV0trQFJkRl6KSms6jWScz5BU4WtSUWmN6bX0JFJ1Bpt2txw8GfrvAADRgvACUwT7XSRpRKL5Iy9V9d0HgoFcZ5ZI1ZnvTVHGiEQ1tvi1/bC1gQ0Awo3wAlMEVxolJ8bJ5TT/gMDMZHdYrzNLpOp0Oh26cnKGJOktpo4ARBnCC0wR6ZVGc/LS5PW41V1McqhtNc+cPGsPcoxknfOncFQAgOhEeIEp6iO80sjldGjF4nxJ6hQMgh+vWJwfkVGgnkSyzismZcjhkPZW1Fk+XQYA4RSR8PL4448rLy9Pbrdbs2bN0ttvv93ttW+++aYcDkenx8cffxyJUhEmdaFDGSO3x8vCAq9W3z5T2Z6OUy7ZHrdW3z5zyOzzEqk600ck6oLMZEnSB2W1YbknAAwFpv9mee6553T//ffr8ccf17x58/SrX/1KixYt0p49ezRu3LhuX7dv3z6lpKSEPh41apTZpSKM6kPLpCO7u+7CAq+uy89WUWmNquqblJncNgVj9YjLZ5lVpz9gdLjnJTmp2ne8XsXltbr+ouwwVQ8A1jI9vPzkJz/RN77xDX3zm9+UJP30pz/Vxo0btXr1aq1atarb12VmZio1NdXs8mASK881cjkdKpyYHvH37a9w19nVrr2e9j12ihl5ARBFTJ02amlp0Y4dO7RgwYIOzy9YsEBbtmzp8bUzZsyQ1+vVtddeqzfeeKPb65qbm1VXV9fhAevVcShjRHW3a6/vTNt/h51lpyzfoA8AwsXU8HLy5En5/X5lZWV1eD4rK0uVlZVdvsbr9WrNmjV6/vnn9cc//lFTpkzRtddeq7feeqvL61etWiWPxxN65OTkhP3rQP9ZdShjLOpp196g5taA9lXWR6wmADBTRP4sdjg6zuMbhtHpuaApU6ZoypQpoY8LCwtVXl6uf//3f9eVV17Z6fqHHnpIy5cvD31cV1dHgBkCIr3aKJb1tmtv0IsfHFX+6JRerwOAoc7UkZeMjAy5XK5OoyxVVVWdRmN6cvnll2v//v1dfi4xMVEpKSkdHrBevQWrjWJVX5dB7zrqM7kSAIgMU8NLQkKCZs2apU2bNnV4ftOmTZo7d26f7/PBBx/I6x0ay1zRN3UWrTaKRX3djfdI7RmTKwGAyDD9z+Lly5frjjvu0OzZs1VYWKg1a9aorKxMS5culdQ27XP06FE988wzktpWI40fP14XXXSRWlpatHbtWj3//PN6/vnnzS4VYWTlaqNYE9y1t9LX1GPfy9FTjTrd3KrhEThrCgDMZPq/Yrfeequqq6v1ox/9SBUVFSooKND69euVm5srSaqoqFBZWVno+paWFj344IM6evSohg0bposuukivvPKKbrjhBrNLRRgFw0sK4cV0wV17l63dKYfUIcAEO8tSk+J1qvGsPjris8UycgDoicMwjKhaP1lXVyePxyOfz0f/i4Wm/+g11Tae1aYHrtTkrGSry4kJXe3z4vW4tWJxvl7+8JjW76rUdxdO1bKrJlpYJQB0rT+/v/mzGGFnGAarjSzQ0669ZTWNWr+rUsXlp6wuEwAGjfCCsDtz1h/aEI2el8jqbtfe6TkjJUnF5bURrggAwo/fLFHos+fbRPpsn7r2XV1dToeSElwRe190b9oYj1xOh47XNavCd0ZezzCrSwKAASO8RJme+h4idary+Xu8dLcZISJrWIJLU7KStaeiTsVltfJOI7wAsC9T93lBZHV3vk2lr0nL1u7UhpKKiNTBuUZD04xxqZKYOgJgf4SXKNHT+TbB51au2xORw/lCG9TRrDukTM9JlSR9QHgBYHOElyjR2/k2hqQKX5OKSmtMr4UN6oam4MjLriM+tfoD1hYDAINAeIkSfT3fpq/XDca5nhdGXoaSCRkjlOyO05mzfu07zgnTAOyL8BIl+nq+TV+vG4zgaiNGXoYWp9OhS8amSqLvBYC9EV6iRPB8m+7W9jjUtupoTl6a6bXU0/MyZAX7XorLai2tAwAGg/ASJYLn20jqFGCCH69YnB+R/V4412joCva90LQLwM4IL1FkYYFXq2+fqWxPx6mhbI9bq2+fGbF9XuroeRmyZoxr22n3QFWDahtbLK4GAAaGP42jTE/n20RKaORlGD9eQ03a8ARNyBiuT0+e1gfltbp6SqbVJQFAv/HbJQp1d75NpLDaaGibMW6kPj15WjsPnyK8ALAlpo0Qdqw2Gtpm5bZNHe0s44RpAPZEeEHYsdpoaJuZmyqpbcVRJHZcBoBwI7wg7Nhhd2ibnJmsEYlxOt3i175KNqsDYD+EF4SVP2CovjkYXhh5GYpcTkdovxemjgDYEeEFYdXQHlwkRl6GspnBvpfDhBcA9kN4QVgF+10S4pxyx7ssrgbdmdm+WR0jLwDsiPCCsAquNGJ33aFtRk7byMuh6kZVNzRbXA0A9A/hBWHFSiN78CTFa3LmCEnSTs45AmAzhBeEFSuN7GPmOPZ7AWBPhBeEFeca2UdwvxeadgHYDeEFYcXIi30Ed9r98EitzvoDFlcDAH1HeEFY0fNiHxMyRijFHaemswF9XMFmdQDsg/CCsKpj5MU2nE6HZrT3vew4XGNxNQDQd4QXhBUnStvLuUMaa60tBAD6gT+PbWzboRqtWr9Xza2D71eIczp099WTtOCi7EHdJzjykjKMHy07YMURADviN4yN/cdr+8L6F/PPXj8w+PByhpEXO7kkxyOHQzpy6oz+avUWORzdXzsyKUH/csvFShueELkCAaALhBebqvCd0fulbX0Kv/jqTI0YRI+J78xZ3fv7D7S3ok5NZ/2D2taf1Ub2kuyO18xxI7Xj8Clt78OS6YvHHtY910yOQGUA0D1+w9jUnz6skGFIc8an6caLvYO6l2EY+tG63TrZ0KLdx3yalZs24Hux2sh+nvjabBWV1kgyur1m+6FTevKdUq3fVUl4AWA5wotNvVh8VJL0xemjB30vh8Oh6Tkj9ee9x/VBWe2gwgurjewnbXiCFhb0PF04Jy9dT205pD0VdSqrbtS49KQIVQcAnbHayIYOVNVr97E6xTkdumHa4EZdgma0nzJcXF47qPsw8hKd0oYn6LK8tlD7akmFxdUAiHWEFxt6qfiYJGn+BaPC1jw5PSdV0uDCS0trQE1n21Y+sdoo+ixqD8rrSyotrgRArCO82IxhGKHwEo4po6CLx55bdXKyoXlA9wiOukjSiETCS7S5/qIsORzSh+W1Olp7xupyAMQwwovNFJfXqqymUUkJLl2XnxW2+ya74zVp1Ii29xjg8uvq0y2SpKQEl+Jc/GhFm8xkt2a3b2q3gdEXABbiz+M+8gcM/WF7+YBe63I6NH/KKGUmuwddR3DUZUF+lpISwvufb3pOqvZXNai4vFafH0AweuPjKknSJWNTw1oXho5FBV5tO3RKG0oq9I0r8qwuB0CMIrz0kT9g6J/+uGvArx+ZFK+f3jZD8y8YNeB7tPoD+tNHbeHlpuljBnyf7kwfl6o/7DiiD8oHttvqK7vaGjkHu3QbQ9fCgmz96E97tP3wKVXVNSkzZfCBHAD6i/DSRw6H9PkLBzZNU3qyQQdPnNZdTxXpO9dM1n3XTpbL2cNWpt3YcrBaJxtalDY8QVdMzhhQLT0JNu1+VO5TIGDI2Y8ay2sa9dERn5wO9brsFvY1OnWYpuekqri8Vht3V+qOwvFWlwQgBhFe+ije5dSTd84e0Gubzvr1z3/ao/9+v0yP/WW/dh4+pf+6bbrSRyT26z7BvV1unOZVvAk9JVOykjUs3qX65lYdPNGgyVnJfX5tcNTl8gnpyujn1wV7WVSQreLyWq3fRXgBYA3CSwS441165OZpmpU7Ut9/oUTvHDipK/71jX5v6V/T3hB7UxhXGZ0vzuXUtDEeFR2q0Qfltf0KL+vbw0u49p3B0LWowKtVr36s90urVd3Q3O8QDgCDxZKQCPryzLF68e55mjBquM6c9etEfXO/Hv6AoSlZyaGTgM0wkM3qyqqZMool49KTdNHoFAUMadOe41aXAyAGMfISYVOyk7Xx/it18ESDAoH+v37CqOH96kXpr9Bmdf1YLr2+hCmjWHPDNK92H6vT/9m4T78vKgs9nzIsXj/+UoFy04f3eg9/wFBRaY2q6puUmezWnLy0AfWCAYg9hBcLxLucmpqdYnUZXZrePvKy73i9zrT4NSyh9xOm17PKKOZ84WKv/nPTJ6o+3RLa3yfojzuP6oHrLujx9RtKKrRy3R5V+JpCz3k9bq1YnK+FBfwcAegZ4QUdeD3DlJWSqON1zdp11Kc5eT0f0nj+lNH1FzFlFCty04fr1fs+p/JTjaHnXt1VqT/sOKIjp3refXdDSYWWrd3Z6QzrSl+Tlq3dqdW3zyTAAOgRPS995A8Y2nqwWi8VH9XWg9XyBz77T2/0OHfOUe/7vQSnjAonMmUUayZnJeuaqVmhx7xJbcv3j5wXaD7LHzC0ct2eTsFFUui5lev2RPX/vwAMHiMvfRBrQ9zTc0Zq4+7jfWrafeUjVhmhzdiRwySpx3OPikprOvz/6LMMSRW+JhWV1qhwYnq4SwQQJRh56UVwiPuz/+AGh7g3tI88RJPgyMsHvTTtllU3atfR9lVGTBnFvDHt4aXC16RWf9fd6FX13QeXgVwHIDYx8tKD3oa4HWob4r4uPzuqVklcPNYjp6Ptl9B3fv+BXN18aeXtvQ2FE9PZ6wPKTHYr3uXQWb+h4/XNGpM6rMtr+novAOgO4aUHsTrEPTwxTtPGePThEZ/WfXis1+u/eIk5m+bBXlxOh7yeYSqradSRmsYuw8ucvDR5PW5V+pq6/KPAISnb4+61URxAbCO89CCWh7gf++sZ+sveKgWMnhsnU5MSdPOM8B8SCXsaO7ItvHTX9+JyOrRicb6Wrd0ph9QhwAQH+FYszo+qkUwA4Ud46UEsD3Hnpg/X16/Is7oM2Eywaben5dILC7xaffvMTk3w2VHcBA8gvAgvPWCIG+ifMalJkqSjvez1srDAq+vys9lhF8CAEF56wBA30D+hkZfa7vd6CXI5HVHVKwYgclgq3YvgEHe2p+PUULbHzU6gwGcEl0v3NvICAIPByEsfMMQN9M35G9UFAoaph4gCiF2Elz5iiBvoXXaKWy5n214vVfXNnUYsASAcmDYCEDZxLqeyU9oCy9E+9L0AwEAQXgCE1Zg+LJcGgMEgvAAIq77s9QIAgxGR8PL4448rLy9Pbrdbs2bN0ttvv93j9Zs3b9asWbPkdrs1YcIE/fKXv4xEmRHnDxjaerBaLxUf1daD1fIHet7NFrCDsamEFwDmMr1h97nnntP999+vxx9/XPPmzdOvfvUrLVq0SHv27NG4ceM6XV9aWqobbrhB3/rWt7R27Vq9++67+va3v61Ro0bplltuMbvciNlQUtFph1EvO4wiCowd2b5RXTdHBADAYDkMo5fDawbpsssu08yZM7V69erQcxdeeKG+9KUvadWqVZ2u/+53v6uXX35Ze/fuDT23dOlSffjhh9q6dWuv71dXVyePxyOfz6eUlJTwfBFhtqGkQsvW7uy0a29wUSn7x8DO3j1wUn/z5PuaMGq4Xv+Hq6wuB4BN9Of3t6nTRi0tLdqxY4cWLFjQ4fkFCxZoy5YtXb5m69atna6//vrrtX37dp09e7bT9c3Nzaqrq+vwGMr8AUMr1+3p8riB4HMr1+0ZclNITHGhr8aet1GdyX8bAYhRpk4bnTx5Un6/X1lZWR2ez8rKUmVlZZevqays7PL61tZWnTx5Ul5vxxGJVatWaeXKleEt3ERFpTUdpoo+y5BU4WtSUWnNkNlXhiku9IfXM0wOh9TcGtDJhhaNSk60uiQAUSYiDbsOR8ddNg3D6PRcb9d39bwkPfTQQ/L5fKFHeXl5GCo2T1V998FlINeZLTjF9dnAVelr0rK1O7WhpMKiyjBUJcQ5lZUc3OuFvhcA4WdqeMnIyJDL5eo0ylJVVdVpdCUoOzu7y+vj4uKUnt55JCIxMVEpKSkdHkNZZnLfdhzt63VmsusUF6x3brk0G9UBCD9Tw0tCQoJmzZqlTZs2dXh+06ZNmjt3bpevKSws7HT9a6+9ptmzZys+Pt60WiNlTl6avB63uht3cqhtSmZOXloky+pSf6a4gPNxQCMAM5k+bbR8+XI9+eST+s1vfqO9e/fqgQceUFlZmZYuXSqpbdrna1/7Wuj6pUuX6vDhw1q+fLn27t2r3/zmN/r1r3+tBx980OxSI8LldGjF4nxJ6hRggh+vWJw/JA59tNsUF8zT34ZtNqoDesYiiMExfZ+XW2+9VdXV1frRj36kiooKFRQUaP369crNzZUkVVRUqKysLHR9Xl6e1q9frwceeEC/+MUvNHr0aD322GNRtcfLwgKvVt8+s1MTbPYQa4K10xQXzDOQhu0xqW17vTBtBHTGIojBM32fl0izwz4vQf6AoaLSGlXVNykzuW2qaCiMuAT5A4au+NfXVelr6rLvxaG2wPXOd68ZUnUjfAa6J9Fbn5zQ135TpAuyRui1B+abXidgF+zz1b0hs88LeuZyOlQ4MV03TR+jwonpQy4A2GmKC+E3mIbt8w9njLK/j4ABYxFE+BBe0KPgFFe2p+PUULbHHdN/IcSCwTRsj2k/36ixxa/axs6bSwKxiEUQ4WN6zwvsb2GBV9flZw/pKS6E32Aatt3xLmWMSNTJhmYdOXVGI4cnhLs8wHZYBBE+hBf0SXCKC7FjsA3bY0cO08mGZh2tbdS0sZ5wlgbYEosgwodpIwBdGuyeRCyXBjqy0z5fQx3hBUCXBtuwPYbwAnTAIojwIbwA6NZgGrbHjgzu9UJ4AYJYBBEe9LwA6NFAG7bHtq844nBGoCMWQQwe4QVArwbSsB3sedl/vF7X/+dboecT45363zfmM6+PmMYiiMFh2giAKXLSkjQyKV6tAUP7jteHHh8d8en/bPzY6vIA2BgjLwBM4Y53aeP9V+pAVUPoucYWv/5+7Q5tO3RKB6rqNSkz2cIKAdgV4QWAaTJT3MpM6diYeM3UTG3ac1zPFpXrf38h36LKANgZ00YAIuqv5+RIkp7feUTNrX6LqwFgR4QXABE1/4JMeT1unWo8q427j1tdDgAbIrwAiCiX06Els9tGX54tKrO4GgB2RHgBEHFfmT1WDoe05WC1DleftrocADZDeAEQcWNHJunKyaMkSc9uK7e4GgB2Q3gBYIlg4+4fth/RWX/A4moA2AnhBYAlrr0wSxkjEnWyoVl/2VtldTkAbIR9XgBYIt7l1F/NGqtfbj6o37xTqnhX/851cTocmjV+pFLc8SZVCGCoIrwAsMxtl+bol5sPquhQjYoO1fT79VdNGaXf/u0cEyoDMJQRXgBYZnzGcC2/7gL9ZW//9nvxG4ZKjtZpy8FqNbf6lRjnMqlCAEMR4QWApe69drLuvXZyv15jGIYufeTPOtnQol1HfJo9nhOqgVhCeAFgOw6HQ5eOT9OrJZXadugU4QVDTn3TWe0sq1XAMKwuxRROh0PzLxhl2fsTXgDY0uxQeKnRMk20uhygg6Vrd+jdA9VWl2GahDinPvnxIsven/ACwJYuHT9SkrT9UI0CAUNOZ/9WKwFm2XXEp3cPVMvldCjfm2J1Oabo7+rAcCO8ALClfG+KkhJcqmtq1f6qBk3JTra6JECS9Ot3PpUkffGS0frPW6dbW0yUYpM6ALYU53Jq5ri20ZeBLLMGzFDpa9KfPqqQJH3jijyLq4lehBcAtjX7vKkjYCh4ZushtQYMzclLU8EYj9XlRC3CCwDburR9ldH2Q6csrgSQGlta9d/vl0li1MVshBcAtjVjXKpcToeO1p7R0dozVpeDGPf8zqPynTmr3PQkff7CLKvLiWqEFwC2lZQQp4LRbas5mDqClQIBQ0+9UypJ+tu54+Vi9ZupCC8AbC24Qd02wgss9OYnVfr05Gklu+O0ZHaO1eVEPcILAFs7t98LfS+wzpNvt426/PWccRqeyC4kZuM7DMDWgiMv+47Xy9d4Vp6keIsrQrR78A8f6oUPjnZ4zh8w5HI6dOfc8dYUFWMYeQFgaxkjEjUhY7gMQ9pRxtQRzPVhea3+344j8geMDg9J+srsHI1JHWZxhbGBkRcAtjd7/Eh9evK0th06pWumssoD5nn8zQOSpJumj9b3b7gw9LzT6VD68ASryoo5jLwAsL3gfi/bShl5gXn2H6/Xxt3H5XBI37lmkjJT3KFHxohEORysMIoUwgsA2wuGl4+O+NR01m9xNYhWqzcflCQtyM/SpEzO0rIS00YAbC83PUkZIxJ1sqFZF698TedvsXHLzLF65OZp1hWHiPEHDBWV1qiqvkmZyW7NyUsL234r5TWNeqn4mCTp21dNCss9MXCEFwC253A49IWLvfrtlkNqaQ10+Nzvi8r0jwunyjOMVUjR7Icvlej5nUd0uvncyNvwRJc+NylDE0aNCD13SU6qFuRn9XuK54m3P5U/YOiKSRm6JCc1XGVjgAgvAKLCisX5Wjp/oloD58LLHb8uUunJ03rv02pdf1G2hdXBTGs2H9QzWw93ev50s18bdh+XdLzD8wsvytYjNxcofURin+5/or5Zz20rlyR9+6qJg64Xg0fPC4Co4HA4lO1xa+zIpNBj3qR0SdKWAyctrg5m8QcM/aJ9BVB3hie49LfzxuvW2TmKczq0YXelrv/p2/rL3uM9vi7oqXdL1dwa0PScVBVOTA9H2RgkRl4ARK0rJmVo7XtleofwErWKSmvkO9Pa4zWnW/xakJ+twonpuqMwV8v/p1ifHG/QN57eriWzxoYavrtiyND/bR/V+fZVE1lRNEQQXgBErcIJGXI4pIMnTqvS16Rsj9vqkhBmVfVN/bquYIxHL99zhf7jtX168p1S/WHHEf1hx5FeX39B1ghOih5CCC8AopYnKV7Txnj00RGf3j1wUrfMGmt1SQizzOS+BdLzr3PHu/T9G/N17YVZ+r9bD/e6vD7O5dDfz58oJydFDxmEFwBRbd6kjLbwcpDwEo2mjfH0+HmHpGxP27Lpz7p8Qroun0APix3RsAsgqs2bmCFJevfASRmGYXE1CLdD1adD//uz4yLBj1cszg/bfi8YGggvAKLa7PEjlRDn1PG6Zh08cbr3F8BWDlQ1SJImZY7o1NOU7XFr9e0ztbDAa0VpMBHTRgCimjvepdm5I7XlYLXePXBSkzJH9P4i2EYwvMzJS9M/31Rg2g67GFoYeQEQ9eZNOjd1hOiyv6pekjRp1Ai5nA4VTkzXTdPHqHBiOsElihFeAES9K9rDy9ZPq9XqD/RyNSLBHzC09WC1Xio+qq0Hq+UPDKwf6fxpI8QOpo0ARL2CMR6luONU19SqkmN1ms7ZNJbaUFKhlev2qMJ3bo8Wr8etFYvz+9Wf0tIa0OHqRknS5CzCSyxh5AVA1AtOJ0hMHVltQ0mFlq3d2SG4SFKlr0nL1u7UhpKKPt/rcPVptQYMjUiMU3YKGxDGEsILgJhA34v1/AFDK9ftUVcTRMHnVq7b0+cppOCU0cTMEWzbH2MILwCiRk99FMHwsv3wqV53VIU5ikprOo24nM+QVOFrUlFpTZ/utz/Y7zKKKaNYQ88LgKjQWx/FhIzhyk5xq7KuSV/7TZGSE8/983flBaN059zxFlQdW/p7DlFvaNaNXYy8ALC9vvRROBwOXXNhpqS2EYC/fFwVeqxct5vRmAgYyDlEPQmGl8mEl5jDyAsAW+utj8Khtj6K6/Kz9U+LpurS8SPV0npuufQPXtqtltaATtQ3KyctKVJlx6Q5eWnyetyq9DV1+d+rp3OIPssfMHTwBCMvsYqRFwC21p8+ihR3vG6eMVa3Xjou9AiuUjle17epCgycy+nQisX5kgZ/DtHRU2fU3BpQQpyT0BmDCC8AbG2wfRRZKYmSpON1zWGrCd1bWODV6ttnDvocouDOuhMyhrOTbgwyddro1KlTuvfee/Xyyy9Lkr74xS/qZz/7mVJTU7t9zV133aWnn366w3OXXXaZ3nvvPTNLBWBTg+2jyGwfeelrCMLgLSzw6rr87EGdQxTqd8lKNqtMDGGmhpevfvWrOnLkiDZs2CBJ+ru/+zvdcccdWrduXY+vW7hwoZ566qnQxwkJCWaWCcDGBttHkZnMyIsVzt84cCAOsEw6ppkWXvbu3asNGzbovffe02WXXSZJeuKJJ1RYWKh9+/ZpypQp3b42MTFR2dnZZpUGIIoE+yiWrd0ph9QhwPSljyIrOPJCz4ut7I/QMml/wOCk6iHItPCydetWeTyeUHCRpMsvv1wej0dbtmzpMby8+eabyszMVGpqqubPn69HHnlEmZmZXV7b3Nys5uZzfzHV1dWF74sAYAvBPorP7vOS3YfzckI9L0wb2YZhGDoYmjYyL7yE6wwmhJ9p4aWysrLLwJGZmanKyspuX7do0SItWbJEubm5Ki0t1Q9+8ANdc8012rFjhxITEztdv2rVKq1cuTKstQOwn4H2UQR7YaqYNrKN43XNqm9ulcvp0Pj04aa8R3DvoM9ORQb3DupPczHCr9+rjR5++GE5HI4eH9u3b5ekLs+aMAyjxzMobr31Vt14440qKCjQ4sWL9eqrr+qTTz7RK6+80uX1Dz30kHw+X+hRXl7e3y8JQJQI9lHcNH2MCiem92l4/9xqo+5HXno6dgCRF+x3yU1PUkJc+BfNhvsMJoRfv0de7rnnHt122209XjN+/Hh99NFHOn78eKfPnThxQllZWX1+P6/Xq9zcXO3fv7/LzycmJnY5IgMAfRFcbVTX1KozLX4NS3B1+DxTB0NPcJm0Wc26/dk7aDBNxxi4foeXjIwMZWRk9HpdYWGhfD6fioqKNGfOHEnS+++/L5/Pp7lz5/b5/aqrq1VeXi6vl38kAIRfcmKchsW7dOasX1X1Tco9bxqCqYPIO17XpM2fnFCXwx7tXv+4SpJ5zbrhPoMJ4Wdaz8uFF16ohQsX6lvf+pZ+9atfSWpbKv2FL3yhQ7Pu1KlTtWrVKt18881qaGjQww8/rFtuuUVer1eHDh3S9773PWVkZOjmm282q1QAMczhcCgzJVGHqxtVVd8cCi/9OXaA1SfhcejkaX3p8XdV23i2T9dfYNIeL+E+gwnhZ+o+L//93/+te++9VwsWLJDUtkndz3/+8w7X7Nu3Tz6fT5Lkcrm0a9cuPfPMM6qtrZXX69XVV1+t5557TsnJbEQEwBxZyW4drm7s0PfC1EFk+RrP6utPb1Nt41nlpif1OiU0KjlR119kzpYa4TyDCeYwNbykpaVp7dq1PV5jGOd+NIYNG6aNGzeaWRIAdJLZxREBTB1Ezll/QN/+3Q59euK0Rnvc+sPSQktHNQa7dxDMx9lGAGJeVxvVMXUQGYZh6Icv7da7B6o1PMGlJ++8dEh8T8N1BhPMYerICwDYwbkjAs6FF6YOIuM37x7S74vK5HBIj/31DOWPTrG6pJBwnMEEcxBeAMS80MhL/blpI6YOBueNfVX6zu8+UGNLa4/XBbdK+f4NF+raC/u+jUakDPYMJpiDaSMAMS+zm43qmDoYuD99WKGG5lYFDPX4cDikr8/L0zeuyLO6ZNgIIy8AYt65npfORwQwdTAwB9o3kvu3Wy7WVVNHdXtdgsup1KSESJWFKEF4ARDzgj0v9c2tamxpVVJCx38amTroH8MwQlv4z8xNHRINuIguTBsBiHkjEuOU1H4sAAc0Dl6Fr0mnW/yKczo67FgMhAvhBUDMczgcoamjng5oRN/sbx91GZ8xXPEufs0g/PipAgCdt1y6npGXwdp/vK3fZbJJZw8BhBcA0LnTpasYeRm0gyfaRl4ILzAL4QUAJGW1j7xUMfIyaPuPt4WXiYQXmITwAgASPS9hYhhGqOdlciYH6sIchBcAUPcb1aF/Tja0yHfmrJwOacIoVhrBHIQXANC5AxZZKj04+9s3p8tJS5I73mVxNYhWhBcAkJSVQs9LOByoolkX5iO8AIDOrTZqaG5VQ3PPhwmie8HwMol+F5iI8AIAattld3hol136XgYquNKIkReYifACAO3OrThi6mig9odGXggvMA/hBQDaZYb6Xhh5GYjaxhadbGgLfuzxAjMRXgCgXVYKK44GI9jvMiZ1mEYkxvVyNTBwhBcAaBc634ielwEJThkx6gKzEV4AoF2o54Xl0gNCsy4ihfACAO04nHFwDnAgIyKE8AIA7TiccXAOHG/bXZeVRjAb4QUA2mWedzijYRgWV2MvDc2tOuZrG7EivMBshBcAaBds2G1s8bPLbj8dbG/WHZWcqNSkBIurQbQjvABAu+GJcUpuX+LLRnX9E9qcbhSjLjAf4QUAzsNGdQMTPE16chbhBeYjvADAeTKT2ahuIA5ymjQiiC0QAeA8We0jL6+WVOhUY0vo+SlZyZo7KcOqsoacltaAAuc1NX9ynA3qEDmEFwA4z+jUYZKkjbuPa+Pu46HnnQ7p9X+4SuMzhltV2pDx5Nuf6pH1e9XVgqzJmcmRLwgxh/ACAOe5/fJcnWpsUUOzP/Tch+W1Kqtp1MsfHtO91062sDrrNTS36rG/7O8yuFyWl6aMEaw0gvkILwBwntGpw7Tqyxd3eO75HUf0D3/4UC8WH9V3rpkkh8NhUXXW+/37ZapratWEUcP14t3z5DzvezE8wRXT3xtEDg27ANCLBRdlKTHOqU9PnNbuY3VWl2OZltaAfv1OqSTp76+coBR3vEYkxoUeBBdECuEFAHqR7I7X5y/MkiS9/OExi6uxzovFR1VZ16SslER9acYYq8tBDCO8AEAffHH6aEnSug+PKRCIvaMDAgFDv9p8UJL09Xl5SoxzWVwRYhnhBQD64Kopo5TsjlOFr0nbDtVYXU6X/AFDWw9W66Xio9p6sFr+MIasP+89roMnTivZHaevXjYubPcFBoKGXQDog8Q4lxYVZOt/th/RSx8e02UT0q0uqYMNJRVauW6PKnzndgb2etxasThfCwu8g7q3YRha3T7qcsfluUp2xw/qfsBgMfICAH100/S2Po/1uyrU0hqwuJpzNpRUaNnanR2CiyRV+pq0bO1ObSipGNT9i0pr9EFZrRLinPrbeXmDuhcQDoy8AEAfXT4hXaOSE3Wivllv7z+ha9ubeCMtEDBUcsynxha/AgFD33uhRF1NEAWf+94LJUpxx8vpHNhqoJ+9vl+S9FezxmpU+8nbgJUILwDQRy6nQ1+42Kun3j2kl4qPWRZefrvlkH70pz19vr7mdIu++uT7g3pPp0P6u89NGNQ9gHAhvABAP9w0fYyeeveQNu05rsaWViUlRP6f0Q27KyW19bQEDEPH+3CIZFZK4qB6VW66ZDRHI2DIILwAQD9cMtaj3PQkHa5u1I9f2asJ5/1Cz0lL0vUXZZv6/qebW/VB2SlJ0rN/d7mO1Tbpr594r9fX/fTWGSqcOLSajIGBIrwAQD84HA7ddMloPfb6Af3u/bJOn//D0kJdOj7NtPcvOlSjs35DY0cO07i0JI0dmSSvx61KX1OXfS8OSdket+bkDbwmf8BQUWmNquqblJncdi/XAPtngHAgvABAP33jigk61XhWdU1nQ8/tOVan/VUNeuWjClPDy7v7T0qSrpiUIYfDIZdDWrE4X8vW7pRD6hBggvFixeL8AYcNM5dgAwPFUmkA6CdPUrz++UsF+q/bZoQe/7hwqiTptd2VMro6cjlM3jnQFl7mTcoIPbewwKvVt89Utsfd4dpsj1urb5854JBh9hJsYKAYeQGAMPjc5AwlJbh0zNekXUd9unhsatjf40R9sz6urJckzf1M/8rCAq+uy88O2/SOP2Bo5bo93S7BdkhauW6PrsvPZgoJEcfICwCEgTvepaumjJIkbWxfDRRuWw62jbpcNDpF6SM677ficjpUODFdN00fo8KJ6YMKFUWlNZ1GXM5nSKrwNamodGgelYDoRngBgDAJrjTaUGJOeHnnvH4Xs1XVdx9cBnIdEE6EFwAIk6unZire5dDBE6d1oKo+rPc2DEPvdtHvYpbMZHfvF/XjOiCcCC8AECYp7vhQsNi4+3hY71168rSO+ZqU4HKaupopaE5emrwet7qbeHKobdXRYJZgAwNFeAGAMApOHYW77yU46jIrd6SGJbjCeu+uuJwOrVicL0mdAkw4lmADg0F4AYAwui4/Sw6H9NERn47WngnbfYNLpK+YbP6UUZBZS7CBwWKpNACEUcaIRF2am6aiQzV6bXel/nZe3qDv6Q8Y2nKwWlJkmnXPF+4l2EA4MPICAGF2fUF4Vx3tOupTfVOrUtxxKhjjCcs9+yOcS7CBcCC8AECYLcjPkiRtO1Sj6obeT3zuTbDfZe7EDIIDIKaNACDsctKSVDAmRSVH6/T8ziNaeNHgekPe3FclSZoXwX4XYCgjvACACa7Pz1bJ0To9uv5jPbr+47DcM9L9LsBQRXgBABP81eyxen7nEZ2oH/y0kSTNnzJK49OTwnIvwO4ILwBgAq9nmN78X1dbXQYQlWjYBQAAtmJqeHnkkUc0d+5cJSUlKTU1tU+vMQxDDz/8sEaPHq1hw4bpqquu0u7du80sEwAA2Iip4aWlpUVLlizRsmXL+vyaf/u3f9NPfvIT/fznP9e2bduUnZ2t6667TvX14T3kDAAA2JOp4WXlypV64IEHNG3atD5dbxiGfvrTn+r73/++vvzlL6ugoEBPP/20Ghsb9bvf/c7MUgEAgE0MqZ6X0tJSVVZWasGCBaHnEhMTNX/+fG3ZsqXL1zQ3N6uurq7DAwAARK8hFV4qK9u20s7KyurwfFZWVuhzn7Vq1Sp5PJ7QIycnx/Q6AQCAdfodXh5++GE5HI4eH9u3bx9UUQ5Hx+2vDcPo9FzQQw89JJ/PF3qUl5cP6r0BAMDQ1u99Xu655x7ddtttPV4zfvz4ARWTnd12mFllZaW83nPbaVdVVXUajQlKTExUYmLigN4PAADYT7/DS0ZGhjIyzNmiOi8vT9nZ2dq0aZNmzJghqW3F0ubNm/Wv//qvprwnAACwF1N7XsrKylRcXKyysjL5/X4VFxeruLhYDQ0NoWumTp2qF154QVLbdNH999+vRx99VC+88IJKSkp01113KSkpSV/96lfNLBUAANiEqccD/PCHP9TTTz8d+jg4mvLGG2/oqquukiTt27dPPp8vdM0//uM/6syZM/r2t7+tU6dO6bLLLtNrr72m5ORkM0sFAAA24TAMw7C6iHCqq6uTx+ORz+dTSkqK1eUAAIA+6M/v7yG1VBoAAKA3hBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArhBcAAGArcVYXAACxyB8wVFRao6r6JmUmuzUnL00up8PqsgBbILwAQIRtKKnQynV7VOFrCj3n9bi1YnG+FhZ4LawMsAemjQAggjaUVGjZ2p0dgoskVfqatGztTm0oqbCoMsA+CC8AECH+gKGV6/bI6OJzwedWrtsjf6CrKwAEEV4AIEKKSms6jbicz5BU4WtSUWlN5IoCbIjwAgARUlXffXAZyHVArCK8AECEZCa7w3odEKsILwAQIXPy0uT1uNXdgmiH2lYdzclLi2RZgO2YGl4eeeQRzZ07V0lJSUpNTe3Ta+666y45HI4Oj8svv9zMMgEgIlxOh1YszpekTgEm+PGKxfns9wL0wtTw0tLSoiVLlmjZsmX9et3ChQtVUVEReqxfv96kCgEgshYWeLX69pnK9nScGsr2uLX69pns8wL0gamb1K1cuVKS9Nvf/rZfr0tMTFR2drYJFQGA9RYWeHVdfjY77AIDNCR32H3zzTeVmZmp1NRUzZ8/X4888ogyMzOtLgsAwsbldKhwYrrVZQC2NOTCy6JFi7RkyRLl5uaqtLRUP/jBD3TNNddox44dSkxM7HR9c3OzmpubQx/X1dVFslwAABBh/e55efjhhzs11H72sX379gEXdOutt+rGG29UQUGBFi9erFdffVWffPKJXnnllS6vX7VqlTweT+iRk5Mz4PcGAABDX79HXu655x7ddtttPV4zfvz4gdbTidfrVW5urvbv39/l5x966CEtX7489HFdXR0BBgCAKNbv8JKRkaGMjAwzaulSdXW1ysvL5fV23YGfmJjY5XQSAACITqYulS4rK1NxcbHKysrk9/tVXFys4uJiNTQ0hK6ZOnWqXnjhBUlSQ0ODHnzwQW3dulWHDh3Sm2++qcWLFysjI0M333yzmaUCAACbMLVh94c//KGefvrp0MczZsyQJL3xxhu66qqrJEn79u2Tz+eTJLlcLu3atUvPPPOMamtr5fV6dfXVV+u5555TcnKymaUCAACbcBiGEVVnr9fV1cnj8cjn8yklJcXqcgAAQB/05/c3ZxsBAABbIbwAAABbGXKb1A1WcBaMzeoAALCP4O/tvnSzRF14qa+vlyT2egEAwIbq6+vl8Xh6vCbqGnYDgYCOHTum5ORkORzhPeQsuAFeeXl5TDYDx/rXL/E9iPWvX+J7EOtfv8T3wKyv3zAM1dfXa/To0XI6e+5qibqRF6fTqbFjx5r6HikpKTH5AxsU61+/xPcg1r9+ie9BrH/9Et8DM77+3kZcgmjYBQAAtkJ4AQAAtkJ46YfExEStWLEiZs9SivWvX+J7EOtfv8T3INa/fonvwVD4+qOuYRcAAEQ3Rl4AAICtEF4AAICtEF4AAICtEF4AAICtEF766PHHH1deXp7cbrdmzZqlt99+2+qSIuqtt97S4sWLNXr0aDkcDr344otWlxQxq1at0qWXXqrk5GRlZmbqS1/6kvbt22d1WRG1evVqXXzxxaFNqQoLC/Xqq69aXZZlVq1aJYfDofvvv9/qUiLm4YcflsPh6PDIzs62uqyIOnr0qG6//Xalp6crKSlJ06dP144dO6wuK2LGjx/f6WfA4XDo7rvvjngthJc+eO6553T//ffr+9//vj744AN97nOf06JFi1RWVmZ1aRFz+vRpXXLJJfr5z39udSkRt3nzZt1999167733tGnTJrW2tmrBggU6ffq01aVFzNixY/Uv//Iv2r59u7Zv365rrrlGN910k3bv3m11aRG3bds2rVmzRhdffLHVpUTcRRddpIqKitBj165dVpcUMadOndK8efMUHx+vV199VXv27NF//Md/KDU11erSImbbtm0d/vtv2rRJkrRkyZLIF2OgV3PmzDGWLl3a4bmpU6ca//RP/2RRRdaSZLzwwgtWl2GZqqoqQ5KxefNmq0ux1MiRI40nn3zS6jIiqr6+3pg8ebKxadMmY/78+cZ9991ndUkRs2LFCuOSSy6xugzLfPe73zWuuOIKq8sYUu677z5j4sSJRiAQiPh7M/LSi5aWFu3YsUMLFizo8PyCBQu0ZcsWi6qClXw+nyQpLS3N4kqs4ff79eyzz+r06dMqLCy0upyIuvvuu3XjjTfq85//vNWlWGL//v0aPXq08vLydNttt+nTTz+1uqSIefnllzV79mwtWbJEmZmZmjFjhp544gmry7JMS0uL1q5dq69//ethPwS5LwgvvTh58qT8fr+ysrI6PJ+VlaXKykqLqoJVDMPQ8uXLdcUVV6igoMDqciJq165dGjFihBITE7V06VK98MILys/Pt7qsiHn22We1c+dOrVq1yupSLHHZZZfpmWee0caNG/XEE0+osrJSc+fOVXV1tdWlRcSnn36q1atXa/Lkydq4caOWLl2qe++9V88884zVpVnixRdfVG1tre666y5L3j/qTpU2y2eTpWEYlqRNWOuee+7RRx99pHfeecfqUiJuypQpKi4uVm1trZ5//nndeeed2rx5c0wEmPLyct1333167bXX5Ha7rS7HEosWLQr972nTpqmwsFATJ07U008/reXLl1tYWWQEAgHNnj1bjz76qCRpxowZ2r17t1avXq2vfe1rFlcXeb/+9a+1aNEijR492pL3Z+SlFxkZGXK5XJ1GWaqqqjqNxiC6fec739HLL7+sN954Q2PHjrW6nIhLSEjQpEmTNHv2bK1atUqXXHKJ/uu//svqsiJix44dqqqq0qxZsxQXF6e4uDht3rxZjz32mOLi4uT3+60uMeKGDx+uadOmaf/+/VaXEhFer7dTUL/wwgtjauFG0OHDh/XnP/9Z3/zmNy2rgfDSi4SEBM2aNSvUVR20adMmzZ0716KqEEmGYeiee+7RH//4R73++uvKy8uzuqQhwTAMNTc3W11GRFx77bXatWuXiouLQ4/Zs2frb/7mb1RcXCyXy2V1iRHX3NysvXv3yuv1Wl1KRMybN6/TFgmffPKJcnNzLarIOk899ZQyMzN14403WlYD00Z9sHz5ct1xxx2aPXu2CgsLtWbNGpWVlWnp0qVWlxYxDQ0NOnDgQOjj0tJSFRcXKy0tTePGjbOwMvPdfffd+t3vfqeXXnpJycnJoVE4j8ejYcOGWVxdZHzve9/TokWLlJOTo/r6ej377LN68803tWHDBqtLi4jk5OROPU7Dhw9Xenp6zPQ+Pfjgg1q8eLHGjRunqqoq/fjHP1ZdXZ3uvPNOq0uLiAceeEBz587Vo48+qq985SsqKirSmjVrtGbNGqtLi6hAIKCnnnpKd955p+LiLIwQEV/fZFO/+MUvjNzcXCMhIcGYOXNmzC2TfeONNwxJnR533nmn1aWZrquvW5Lx1FNPWV1axHz9618P/fyPGjXKuPbaa43XXnvN6rIsFWtLpW+99VbD6/Ua8fHxxujRo40vf/nLxu7du60uK6LWrVtnFBQUGImJicbUqVONNWvWWF1SxG3cuNGQZOzbt8/SOhyGYRjWxCYAAID+o+cFAADYCuEFAADYCuEFAADYCuEFAADYCuEFAADYCuEFAADYCuEFAADYCuEFAADYCuEFAADYCuEFAADYCuEFAADYCuEFAADYyv8H6mnAqWh4bLwAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.ensemble import RandomForestRegressor as RFR\n", "\n", "x_max = 7. # 予測の範囲の上限\n", "\n", "# トレーニングデータ作成\n", "n_tr = 20\n", "x = np.linspace(0., np.pi*2., n_tr) # リスト 0から2πまでをn_tr等分した値を一次元配列\n", "y = make_data_by_sin_gaussian(x, 0.3)\n", "\n", "# 基底関数をM次の多項式とする\n", "deg = 5\n", "X = np.vander(x, deg+1) # 計画行列の作成\n", "\n", "# 以上、データ作成###\n", "\n", "# 学習 (説明変数X, 目的変数yは上で作ってあるものとする)\n", "# 木の深さ max_depth (defaultでは最後の1個になるまで分類)\n", "# ランダムに作成する木の数 n_estimators (default = 100)\n", "rfr = RFR(max_depth=3, n_estimators=5)\n", "#rfr = RFR()\n", "rfr.fit(X,y)\n", "\n", "# 予測\n", "num_test = 100\n", "x_test = np.linspace(0, x_max, num_test)\n", "y_test = rfr.predict(np.vander(x_test, deg+1))\n", " \n", "#プロット\n", "plt.scatter(x, y)\n", "plt.plot(x_test, y_test)\n", "\n", "# sin(x)のプロット\n", "#plt.plot(x_test, np.sin(x_test))\n", "\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "どちらの方法もハイパーパラメータがたくさんあり、利用する際にそれらを根拠をもって指定できる場合は、それでよいが、多くの場合、交差検定を行って、過学習にならないようにパラメータをセットする必要がある。そのことは忘れないでほしい。\n", "\n", "sckit-learnでは、パラメータをしらみつぶしにさがすGridSearchCVなどが用意されている。\n", "\n", "とはいえ、グリッドサーチにしてもその範囲は使う側がセットする必要がある。(それが多すぎるとランダムフォレストの場合などは猛烈に長い時間を要することになる。)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 6\n", "\n", "Exercise 5で試したのと同じノイズ付きsinc関数のデータについて、ニューラルネットとランダムフォレストによる回帰分析を行ってみよう。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 参考:決定木の可視化\n", "\n", "ランダムフォレストは、データから重複を許してランダムに抽出した複数のデータセットを用意して、それらを決定木により分類学習を行い、アンサンブル学習(多数決)により予測器をつくるものである。\n", "\n", "その一つの決定木を図示するモジュールが作られている。\n", "\n", "分類木を図示するツールとしてdtreevizが高機能である。単純な回帰問題で決定木を描くのはあまり意味がないが、紹介しておく。\n", "\n", "Anacondaのパッケージには入ってないので、次の方法をとる。\n", "- condaにリポジトリを指定してインストール\n", " - conda install -c conda-forge dtreeviz\n", " - conda install -c anaconda graphviz\n", "- pythonのインストールツールpipを使う\n", "\n", "上で述べたように、説明変数が一つという単純な例ではあまり意味がないが、分類がどのように行われるかを知る参考になるだろう。(findfontの警告がたくさん出るので、出力は割愛してある。)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn import tree\n", "import dtreeviz\n", "import graphviz\n", "estimators = rfr.estimators_\n", "viz = dtreeviz.model(\n", " estimators[0],\n", " X, \n", " y\n", ") \n", "v = viz.view()\n", "v\n", "#v.save(\"images/dtreeviz_sample.svg\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Random Forest for Boston data\n", "\n", "Random Forestの有用性を示す事例として、線形重回帰の説明に使ったBostonデータの分析を示す。\n", "\n", "Bostonデータの内容については、線形重回帰の説明ページ\n", "\n", "https://toyoki-lab.ee.yamanashi.ac.jp/~toyoki/lectures/PracDataSci/multiple_regression.html\n", "\n", "を参照。\n", "\n", "\n", "8回に説明するが、効果的な機械学習にするための前処理(標準化)、および推定の評価方法の使用例も含む。\n", "\n", "### 学習用(トレーニング)データとテスト用データに分割\n", "\n", "- データを一定の割合でランダムに2つに分割\n", "- トレーニングデータをモデルに与えて学習させる\n", "- 学習したモデルにテスト用データの説明変数を与え推定値を得る。\n", "- 推定値とテストデータの目的変数を比較し、当てはまり度をチェックする。(どのくらい当たっていたかを答え合わせする。)\n", "\n", "分割するメソッドは、**train_test_split**という名前で用意されている。\n", "\n", "### 標準化\n", "\n", "説明変数の中に絶対値が大きく異なる変数が含まれている場合、絶対値やばらつきが大きい変数の効果が強く表れてしまう。それを前もって是正する。\n", "\n", "説明変数の平均値と標準偏差がそろうようにスケールすることを、データの標準化(Standardization)と呼ぶ。\n", "\n", "scikit-learnでそれを行う関数は、**StandardScaler**である。\n", "\n", "\n", "### 実例\n", "\n", "以下のプログラムは、\n", "\n", "https://www.blopig.com/blog/2017/07/using-random-forests-in-python-with-scikit-learn/\n", "\n", "で参照できるOxford Protain Informatics Groupによるものである。Bostonの住宅価格データを用いて、複数の説明変数がどのように住宅価格を説明できるかを線形重回帰とRandom Forestを用いて試している。\n", "\n", "このページでは標準化を行う前と後で、主成分分析により説明変数の重要度の変化を見ているが、ここでは省略する。(主成分分析は、8週目で少し触れる。)\n", "\n", "まず、データをpandasのDataFrameとして作成する。\n", "\n", "注意:最新のScikit-learnバージョンでは、Bostonデータには倫理的問題があるので、将来のバージョンでは削除される予定という警告がでるようになった。そのことに留意して閲覧してほしい。\n", "\n", "同様なプログラム例:https://hinomaruc.hatenablog.com/entry/2019/11/14/200857\n" ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "tags": [] }, "outputs": [], "source": [ "# 必要なモジュールのインポート\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/anaconda3/lib/python3.9/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function load_boston is deprecated; `load_boston` is deprecated in 1.0 and will be removed in 1.2.\n", "\n", " The Boston housing prices dataset has an ethical problem. You can refer to\n", " the documentation of this function for further details.\n", "\n", " The scikit-learn maintainers therefore strongly discourage the use of this\n", " dataset unless the purpose of the code is to study and educate about\n", " ethical issues in data science and machine learning.\n", "\n", " In this special case, you can fetch the dataset from the original\n", " source::\n", "\n", " import pandas as pd\n", " import numpy as np\n", "\n", "\n", " data_url = \"http://lib.stat.cmu.edu/datasets/boston\"\n", " raw_df = pd.read_csv(data_url, sep=\"\\s+\", skiprows=22, header=None)\n", " data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])\n", " target = raw_df.values[1::2, 2]\n", "\n", " Alternative datasets include the California housing dataset (i.e.\n", " :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing\n", " dataset. You can load the datasets as follows::\n", "\n", " from sklearn.datasets import fetch_california_housing\n", " housing = fetch_california_housing()\n", "\n", " for the California housing dataset and::\n", "\n", " from sklearn.datasets import fetch_openml\n", " housing = fetch_openml(name=\"house_prices\", as_frame=True)\n", "\n", " for the Ames housing dataset.\n", " \n" ] } ], "source": [ "# scikit-learnに付属するbostonデータの読み込み\n", "from sklearn import datasets\n", "boston = datasets.load_boston()\n", "features = pd.DataFrame(boston.data, columns=boston.feature_names)\n", "targets = boston.target" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012345678910
00.0063218.002.310.00.5386.57565.24.09001.0296.015.3
1396.900004.9824.00NaNNaNNaNNaNNaNNaNNaNNaN
20.027310.007.070.00.4696.42178.94.96712.0242.017.8
3396.900009.1421.60NaNNaNNaNNaNNaNNaNNaNNaN
40.027290.007.070.00.4697.18561.14.96712.0242.017.8
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8 9 10\n", "0 0.00632 18.00 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3\n", "1 396.90000 4.98 24.00 NaN NaN NaN NaN NaN NaN NaN NaN\n", "2 0.02731 0.00 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8\n", "3 396.90000 9.14 21.60 NaN NaN NaN NaN NaN NaN NaN NaN\n", "4 0.02729 0.00 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'''\n", "上のセルのように読み込むと、scikit-learn付属のデータにはethical problemがあるので、\n", "データの意味を削除した次のようなデータを使うようにとのメッセージがでる。\n", "よって、それを使うことにする。\n", "'''\n", "data_url = \"http://lib.stat.cmu.edu/datasets/boston\"\n", "raw_df = pd.read_csv(data_url, sep=\"\\s+\", skiprows=22, header=None)\n", "features = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])\n", "targets = raw_df.values[1::2, 2]\n", "raw_df.head()" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "506\n", "506\n" ] } ], "source": [ "# データ数の確かめ\n", "print(len(features))\n", "print(len(targets))" ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "tags": [] }, "outputs": [], "source": [ "# 訓練データとテストデータへの分割\n", "from sklearn.model_selection import train_test_split\n", "X_train, X_test, y_train, y_test = train_test_split(features, targets, train_size=0.8, random_state=0)\n", "\n", "# データの正規化\n", "from sklearn.preprocessing import StandardScaler\n", "scaler = StandardScaler().fit(X_train)\n", "X_train_scaled = scaler.transform(X_train)\n", "X_test_scaled = scaler.transform(X_test)\n", "\n", "# オリジナルのscikit-learnに含まれるBoston Dataを使う場合\n", "#X_train_scaled = pd.DataFrame(scaler.transform(X_train), index=X_train.index.values,\n", "# columns=X_train.columns.values)\n", "#X_test_scaled = pd.DataFrame(scaler.transform(X_test), index=X_test.index.values, columns=X_test.columns.values)" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [], "source": [ "# モデルの選択\n", "# (1) 線形重回帰\n", "from sklearn.linear_model import LinearRegression\n", "lm_model = LinearRegression()\n", "\n", "# (2)ランダムフォレスト回帰\n", "from sklearn.ensemble import RandomForestRegressor\n", "rfr_model = RandomForestRegressor(n_estimators=500, oob_score=True, random_state=0)" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test data R-2 score: (Linear) 0.589 (Random Forest) 0.776\n", "Test data Spearman correlation: (Linear) 0.805 (Random Forest) 0.845\n", "Test data Pearson correlation: (Linear) 0.769 (Random Forest) 0.884\n" ] } ], "source": [ "# 学習(トレーニング)\n", "lm_model.fit(X_train_scaled, y_train)\n", "rfr_model.fit(X_train_scaled, y_train)\n", "\n", "#### 学習機械によるテストデータでの予測(推定)\n", "lm_predicted_test = lm_model.predict(X_test_scaled)\n", "rfr_predicted_test = rfr_model.predict(X_test_scaled)\n", "\n", "\n", "# 各種評価指数の計算\n", "from sklearn.metrics import r2_score\n", "from scipy.stats import spearmanr, pearsonr\n", "\n", "# (1) 決定係数の算出\n", "lm_test_score = r2_score(y_test, lm_predicted_test)\n", "rfr_test_score = r2_score(y_test, rfr_predicted_test)\n", "\n", "# (2) Spearman Correlationの算出\n", "lm_spearman = spearmanr(y_test, lm_predicted_test)\n", "rfr_spearman = spearmanr(y_test, rfr_predicted_test)\n", "\n", "# (3) Pearson Correlationの算出\n", "lm_pearson = pearsonr(y_test, lm_predicted_test)\n", "rfr_pearson = pearsonr(y_test, rfr_predicted_test)\n", "\n", "# 各種評価指標のプリント\n", "print(\"Test data R-2 score: (Linear) %5.3f (Random Forest) %5.3f\"\n", " % (lm_test_score, rfr_test_score))\n", "\n", "print(\"Test data Spearman correlation: (Linear) %.3f (Random Forest) %.3f\"\n", " % (lm_spearman[0], rfr_spearman[0]))\n", "\n", "print(\"Test data Pearson correlation: (Linear) %.3f (Random Forest) %.3f\"\n", " % (lm_pearson[0], rfr_pearson[0]))" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# 予測と観測値の散布図\n", "fig, axes = plt.subplots(1, 2, figsize=(10,5))\n", "axes[0].scatter(lm_predicted_test, y_test)\n", "axes[0].set_xlabel(\"Predicted value by Linear Model\")\n", "axes[0].set_ylabel(\"Observed value\")\n", "axes[0].set_title(\"Linear Regression\")\n", "axes[1].scatter(rfr_predicted_test, y_test)\n", "axes[1].set_xlabel(\"Predicted value by Random Forest\")\n", "axes[1].set_ylabel(\"Observed value\")\n", "axes[1].set_title(\"Random Forest Regression\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "上記では、引用したページと同じ木の数(n_estimators)を500にしているが、もっと少なくてもさほど違わないだろう。\n", "\n", "以下にgraphvizによる図示プログラム例を示す。(巨大な木で、画像のダウンロードに時間がかかるので出力は削除した。)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn import tree\n", "from dtreeviz.trees import *\n", "import graphviz\n", "estimators = rf.estimators_\n", "viz = dtreeviz(\n", " estimators[0],\n", " X_train, \n", " y_train,\n", " target_name='price',\n", " feature_names=boston.feature_names,\n", " # class_names=[],\n", ") \n", "\n", "viz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "多数の大量な説明変数をもつデータの分類、回帰に対する決定木ベースの手法は、最近も研究が進んでいるようだ。\n", "そのなかでもXGBoostなどが有名である。モジュールも存在するので興味ある人は試してみてほしい。" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other Housing Datasets in sklearn\n", "\n", "Bostonデータ以外の同種のサンプルデータをのぞいてみる。\n", "\n", "
\n", "
\"california_housing\" in \"datasets\"
\n", "
Boston Dataと同様な分析ができそう。データ数が多く、featureの数がすくない。
\n", "
\"housing_prices\" in \"openml\"
\n", "
Iowa州のAmesのデータ。features項目が非常に多い。
\n", "
\n", "\n", "上のセルで試みたBostonデータの分析と同じことをCaliforniaデータに対して行ってみた結果を\n", "https://toyoki-lab.ee.yamanashi.ac.jp/~toyoki/lectures/PracDataSci/HousingDataStudy.html\n", "に示す。(詳しい分析はしていない。)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ".. _california_housing_dataset:\n", "\n", "California Housing dataset\n", "--------------------------\n", "\n", "**Data Set Characteristics:**\n", "\n", " :Number of Instances: 20640\n", "\n", " :Number of Attributes: 8 numeric, predictive attributes and the target\n", "\n", " :Attribute Information:\n", " - MedInc median income in block group\n", " - HouseAge median house age in block group\n", " - AveRooms average number of rooms per household\n", " - AveBedrms average number of bedrooms per household\n", " - Population block group population\n", " - AveOccup average number of household members\n", " - Latitude block group latitude\n", " - Longitude block group longitude\n", "\n", " :Missing Attribute Values: None\n", "\n", "This dataset was obtained from the StatLib repository.\n", "https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html\n", "\n", "The target variable is the median house value for California districts,\n", "expressed in hundreds of thousands of dollars ($100,000).\n", "\n", "This dataset was derived from the 1990 U.S. census, using one row per census\n", "block group. A block group is the smallest geographical unit for which the U.S.\n", "Census Bureau publishes sample data (a block group typically has a population\n", "of 600 to 3,000 people).\n", "\n", "An household is a group of people residing within a home. Since the average\n", "number of rooms and bedrooms in this dataset are provided per household, these\n", "columns may take surpinsingly large values for block groups with few households\n", "and many empty houses, such as vacation resorts.\n", "\n", "It can be downloaded/loaded using the\n", ":func:`sklearn.datasets.fetch_california_housing` function.\n", "\n", ".. topic:: References\n", "\n", " - Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions,\n", " Statistics and Probability Letters, 33 (1997) 291-297\n", "\n" ] } ], "source": [ "# California Housing Data\n", "from sklearn.datasets import fetch_california_housing\n", "housing = fetch_california_housing()\n", "print(housing.DESCR)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.\n", "\n", "With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.\n", "\n", "MSSubClass: Identifies the type of dwelling involved in the sale.\t\n", "\n", " 20\t1-STORY 1946 & NEWER ALL STYLES\n", " 30\t1-STORY 1945 & OLDER\n", " 40\t1-STORY W/FINISHED ATTIC ALL AGES\n", " 45\t1-1/2 STORY - UNFINISHED ALL AGES\n", " 50\t1-1/2 STORY FINISHED ALL AGES\n", " 60\t2-STORY 1946 & NEWER\n", " 70\t2-STORY 1945 & OLDER\n", " 75\t2-1/2 STORY ALL AGES\n", " 80\tSPLIT OR MULTI-LEVEL\n", " 85\tSPLIT FOYER\n", " 90\tDUPLEX - ALL STYLES AND AGES\n", " 120\t1-STORY PUD (Planned Unit Development) - 1946 & NEWER\n", " 150\t1-1/2 STORY PUD - ALL AGES\n", " 160\t2-STORY PUD - 1946 & NEWER\n", " 180\tPUD - MULTILEVEL - INCL SPLIT LEV/FOYER\n", " 190\t2 FAMILY CONVERSION - ALL STYLES AND AGES\n", "\n", "MSZoning: Identifies the general zoning classification of the sale.\n", "\t\t\n", " A\tAgriculture\n", " C\tCommercial\n", " FV\tFloating Village Residential\n", " I\tIndustrial\n", " RH\tResidential High Density\n", " RL\tResidential Low Density\n", " RP\tResidential Low Density Park \n", " RM\tResidential Medium Density\n", "\t\n", "LotFrontage: Linear feet of street connected to property\n", "\n", "LotArea: Lot size in square feet\n", "\n", "Street: Type of road access to property\n", "\n", " Grvl\tGravel\t\n", " Pave\tPaved\n", " \t\n", "Alley: Type of alley access to property\n", "\n", " Grvl\tGravel\n", " Pave\tPaved\n", " NA \tNo alley access\n", "\t\t\n", "LotShape: General shape of property\n", "\n", " Reg\tRegular\t\n", " IR1\tSlightly irregular\n", " IR2\tModerately Irregular\n", " IR3\tIrregular\n", " \n", "LandContour: Flatness of the property\n", "\n", " Lvl\tNear Flat/Level\t\n", " Bnk\tBanked - Quick and significant rise from street grade to building\n", " HLS\tHillside - Significant slope from side to side\n", " Low\tDepression\n", "\t\t\n", "Utilities: Type of utilities available\n", "\t\t\n", " AllPub\tAll public Utilities (E,G,W,& S)\t\n", " NoSewr\tElectricity, Gas, and Water (Septic Tank)\n", " NoSeWa\tElectricity and Gas Only\n", " ELO\tElectricity only\t\n", "\t\n", "LotConfig: Lot configuration\n", "\n", " Inside\tInside lot\n", " Corner\tCorner lot\n", " CulDSac\tCul-de-sac\n", " FR2\tFrontage on 2 sides of property\n", " FR3\tFrontage on 3 sides of property\n", "\t\n", "LandSlope: Slope of property\n", "\t\t\n", " Gtl\tGentle slope\n", " Mod\tModerate Slope\t\n", " Sev\tSevere Slope\n", "\t\n", "Neighborhood: Physical locations within Ames city limits\n", "\n", " Blmngtn\tBloomington Heights\n", " Blueste\tBluestem\n", " BrDale\tBriardale\n", " BrkSide\tBrookside\n", " ClearCr\tClear Creek\n", " CollgCr\tCollege Creek\n", " Crawfor\tCrawford\n", " Edwards\tEdwards\n", " Gilbert\tGilbert\n", " IDOTRR\tIowa DOT and Rail Road\n", " MeadowV\tMeadow Village\n", " Mitchel\tMitchell\n", " Names\tNorth Ames\n", " NoRidge\tNorthridge\n", " NPkVill\tNorthpark Villa\n", " NridgHt\tNorthridge Heights\n", " NWAmes\tNorthwest Ames\n", " OldTown\tOld Town\n", " SWISU\tSouth & West of Iowa State University\n", " Sawyer\tSawyer\n", " SawyerW\tSawyer West\n", " Somerst\tSomerset\n", " StoneBr\tStone Brook\n", " Timber\tTimberland\n", " Veenker\tVeenker\n", "\t\t\t\n", "Condition1: Proximity to various conditions\n", "\t\n", " Artery\tAdjacent to arterial street\n", " Feedr\tAdjacent to feeder street\t\n", " Norm\tNormal\t\n", " RRNn\tWithin 200' of North-South Railroad\n", " RRAn\tAdjacent to North-South Railroad\n", " PosN\tNear positive off-site feature--park, greenbelt, etc.\n", " PosA\tAdjacent to postive off-site feature\n", " RRNe\tWithin 200' of East-West Railroad\n", " RRAe\tAdjacent to East-West Railroad\n", "\t\n", "Condition2: Proximity to various conditions (if more than one is present)\n", "\t\t\n", " Artery\tAdjacent to arterial street\n", " Feedr\tAdjacent to feeder street\t\n", " Norm\tNormal\t\n", " RRNn\tWithin 200' of North-South Railroad\n", " RRAn\tAdjacent to North-South Railroad\n", " PosN\tNear positive off-site feature--park, greenbelt, etc.\n", " PosA\tAdjacent to postive off-site feature\n", " RRNe\tWithin 200' of East-West Railroad\n", " RRAe\tAdjacent to East-West Railroad\n", "\t\n", "BldgType: Type of dwelling\n", "\t\t\n", " 1Fam\tSingle-family Detached\t\n", " 2FmCon\tTwo-family Conversion; originally built as one-family dwelling\n", " Duplx\tDuplex\n", " TwnhsE\tTownhouse End Unit\n", " TwnhsI\tTownhouse Inside Unit\n", "\t\n", "HouseStyle: Style of dwelling\n", "\t\n", " 1Story\tOne story\n", " 1.5Fin\tOne and one-half story: 2nd level finished\n", " 1.5Unf\tOne and one-half story: 2nd level unfinished\n", " 2Story\tTwo story\n", " 2.5Fin\tTwo and one-half story: 2nd level finished\n", " 2.5Unf\tTwo and one-half story: 2nd level unfinished\n", " SFoyer\tSplit Foyer\n", " SLvl\tSplit Level\n", "\t\n", "OverallQual: Rates the overall material and finish of the house\n", "\n", " 10\tVery Excellent\n", " 9\tExcellent\n", " 8\tVery Good\n", " 7\tGood\n", " 6\tAbove Average\n", " 5\tAverage\n", " 4\tBelow Average\n", " 3\tFair\n", " 2\tPoor\n", " 1\tVery Poor\n", "\t\n", "OverallCond: Rates the overall condition of the house\n", "\n", " 10\tVery Excellent\n", " 9\tExcellent\n", " 8\tVery Good\n", " 7\tGood\n", " 6\tAbove Average\t\n", " 5\tAverage\n", " 4\tBelow Average\t\n", " 3\tFair\n", " 2\tPoor\n", " 1\tVery Poor\n", "\t\t\n", "YearBuilt: Original construction date\n", "\n", "YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)\n", "\n", "RoofStyle: Type of roof\n", "\n", " Flat\tFlat\n", " Gable\tGable\n", " Gambrel\tGabrel (Barn)\n", " Hip\tHip\n", " Mansard\tMansard\n", " Shed\tShed\n", "\t\t\n", "RoofMatl: Roof material\n", "\n", " ClyTile\tClay or Tile\n", " CompShg\tStandard (Composite) Shingle\n", " Membran\tMembrane\n", " Metal\tMetal\n", " Roll\tRoll\n", " Tar&Grv\tGravel & Tar\n", " WdShake\tWood Shakes\n", " WdShngl\tWood Shingles\n", "\t\t\n", "Exterior1st: Exterior covering on house\n", "\n", " AsbShng\tAsbestos Shingles\n", " AsphShn\tAsphalt Shingles\n", " BrkComm\tBrick Common\n", " BrkFace\tBrick Face\n", " CBlock\tCinder Block\n", " CemntBd\tCement Board\n", " HdBoard\tHard Board\n", " ImStucc\tImitation Stucco\n", " MetalSd\tMetal Siding\n", " Other\tOther\n", " Plywood\tPlywood\n", " PreCast\tPreCast\t\n", " Stone\tStone\n", " Stucco\tStucco\n", " VinylSd\tVinyl Siding\n", " Wd Sdng\tWood Siding\n", " WdShing\tWood Shingles\n", "\t\n", "Exterior2nd: Exterior covering on house (if more than one material)\n", "\n", " AsbShng\tAsbestos Shingles\n", " AsphShn\tAsphalt Shingles\n", " BrkComm\tBrick Common\n", " BrkFace\tBrick Face\n", " CBlock\tCinder Block\n", " CemntBd\tCement Board\n", " HdBoard\tHard Board\n", " ImStucc\tImitation Stucco\n", " MetalSd\tMetal Siding\n", " Other\tOther\n", " Plywood\tPlywood\n", " PreCast\tPreCast\n", " Stone\tStone\n", " Stucco\tStucco\n", " VinylSd\tVinyl Siding\n", " Wd Sdng\tWood Siding\n", " WdShing\tWood Shingles\n", "\t\n", "MasVnrType: Masonry veneer type\n", "\n", " BrkCmn\tBrick Common\n", " BrkFace\tBrick Face\n", " CBlock\tCinder Block\n", " None\tNone\n", " Stone\tStone\n", "\t\n", "MasVnrArea: Masonry veneer area in square feet\n", "\n", "ExterQual: Evaluates the quality of the material on the exterior \n", "\t\t\n", " Ex\tExcellent\n", " Gd\tGood\n", " TA\tAverage/Typical\n", " Fa\tFair\n", " Po\tPoor\n", "\t\t\n", "ExterCond: Evaluates the present condition of the material on the exterior\n", "\t\t\n", " Ex\tExcellent\n", " Gd\tGood\n", " TA\tAverage/Typical\n", " Fa\tFair\n", " Po\tPoor\n", "\t\t\n", "Foundation: Type of foundation\n", "\t\t\n", " BrkTil\tBrick & Tile\n", " CBlock\tCinder Block\n", " PConc\tPoured Contrete\t\n", " Slab\tSlab\n", " Stone\tStone\n", " Wood\tWood\n", "\t\t\n", "BsmtQual: Evaluates the height of the basement\n", "\n", " Ex\tExcellent (100+ inches)\t\n", " Gd\tGood (90-99 inches)\n", " TA\tTypical (80-89 inches)\n", " Fa\tFair (70-79 inches)\n", " Po\tPoor (<70 inches\n", " NA\tNo Basement\n", "\t\t\n", "BsmtCond: Evaluates the general condition of the basement\n", "\n", " Ex\tExcellent\n", " Gd\tGood\n", " TA\tTypical - slight dampness allowed\n", " Fa\tFair - dampness or some cracking or settling\n", " Po\tPoor - Severe cracking, settling, or wetness\n", " NA\tNo Basement\n", "\t\n", "BsmtExposure: Refers to walkout or garden level walls\n", "\n", " Gd\tGood Exposure\n", " Av\tAverage Exposure (split levels or foyers typically score average or above)\t\n", " Mn\tMimimum Exposure\n", " No\tNo Exposure\n", " NA\tNo Basement\n", "\t\n", "BsmtFinType1: Rating of basement finished area\n", "\n", " GLQ\tGood Living Quarters\n", " ALQ\tAverage Living Quarters\n", " BLQ\tBelow Average Living Quarters\t\n", " Rec\tAverage Rec Room\n", " LwQ\tLow Quality\n", " Unf\tUnfinshed\n", " NA\tNo Basement\n", "\t\t\n", "BsmtFinSF1: Type 1 finished square feet\n", "\n", "BsmtFinType2: Rating of basement finished area (if multiple types)\n", "\n", " GLQ\tGood Living Quarters\n", " ALQ\tAverage Living Quarters\n", " BLQ\tBelow Average Living Quarters\t\n", " Rec\tAverage Rec Room\n", " LwQ\tLow Quality\n", " Unf\tUnfinshed\n", " NA\tNo Basement\n", "\n", "BsmtFinSF2: Type 2 finished square feet\n", "\n", "BsmtUnfSF: Unfinished square feet of basement area\n", "\n", "TotalBsmtSF: Total square feet of basement area\n", "\n", "Heating: Type of heating\n", "\t\t\n", " Floor\tFloor Furnace\n", " GasA\tGas forced warm air furnace\n", " GasW\tGas hot water or steam heat\n", " Grav\tGravity furnace\t\n", " OthW\tHot water or steam heat other than gas\n", " Wall\tWall furnace\n", "\t\t\n", "HeatingQC: Heating quality and condition\n", "\n", " Ex\tExcellent\n", " Gd\tGood\n", " TA\tAverage/Typical\n", " Fa\tFair\n", " Po\tPoor\n", "\t\t\n", "CentralAir: Central air conditioning\n", "\n", " N\tNo\n", " Y\tYes\n", "\t\t\n", "Electrical: Electrical system\n", "\n", " SBrkr\tStandard Circuit Breakers & Romex\n", " FuseA\tFuse Box over 60 AMP and all Romex wiring (Average)\t\n", " FuseF\t60 AMP Fuse Box and mostly Romex wiring (Fair)\n", " FuseP\t60 AMP Fuse Box and mostly knob & tube wiring (poor)\n", " Mix\tMixed\n", "\t\t\n", "1stFlrSF: First Floor square feet\n", " \n", "2ndFlrSF: Second floor square feet\n", "\n", "LowQualFinSF: Low quality finished square feet (all floors)\n", "\n", "GrLivArea: Above grade (ground) living area square feet\n", "\n", "BsmtFullBath: Basement full bathrooms\n", "\n", "BsmtHalfBath: Basement half bathrooms\n", "\n", "FullBath: Full bathrooms above grade\n", "\n", "HalfBath: Half baths above grade\n", "\n", "Bedroom: Bedrooms above grade (does NOT include basement bedrooms)\n", "\n", "Kitchen: Kitchens above grade\n", "\n", "KitchenQual: Kitchen quality\n", "\n", " Ex\tExcellent\n", " Gd\tGood\n", " TA\tTypical/Average\n", " Fa\tFair\n", " Po\tPoor\n", " \t\n", "TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)\n", "\n", "Functional: Home functionality (Assume typical unless deductions are warranted)\n", "\n", " Typ\tTypical Functionality\n", " Min1\tMinor Deductions 1\n", " Min2\tMinor Deductions 2\n", " Mod\tModerate Deductions\n", " Maj1\tMajor Deductions 1\n", " Maj2\tMajor Deductions 2\n", " Sev\tSeverely Damaged\n", " Sal\tSalvage only\n", "\t\t\n", "Fireplaces: Number of fireplaces\n", "\n", "FireplaceQu: Fireplace quality\n", "\n", " Ex\tExcellent - Exceptional Masonry Fireplace\n", " Gd\tGood - Masonry Fireplace in main level\n", " TA\tAverage - Prefabricated Fireplace in main living area or Masonry Fireplace in basement\n", " Fa\tFair - Prefabricated Fireplace in basement\n", " Po\tPoor - Ben Franklin Stove\n", " NA\tNo Fireplace\n", "\t\t\n", "GarageType: Garage location\n", "\t\t\n", " 2Types\tMore than one type of garage\n", " Attchd\tAttached to home\n", " Basment\tBasement Garage\n", " BuiltIn\tBuilt-In (Garage part of house - typically has room above garage)\n", " CarPort\tCar Port\n", " Detchd\tDetached from home\n", " NA\tNo Garage\n", "\t\t\n", "GarageYrBlt: Year garage was built\n", "\t\t\n", "GarageFinish: Interior finish of the garage\n", "\n", " Fin\tFinished\n", " RFn\tRough Finished\t\n", " Unf\tUnfinished\n", " NA\tNo Garage\n", "\t\t\n", "GarageCars: Size of garage in car capacity\n", "\n", "GarageArea: Size of garage in square feet\n", "\n", "GarageQual: Garage quality\n", "\n", " Ex\tExcellent\n", " Gd\tGood\n", " TA\tTypical/Average\n", " Fa\tFair\n", " Po\tPoor\n", " NA\tNo Garage\n", "\t\t\n", "GarageCond: Garage condition\n", "\n", " Ex\tExcellent\n", " Gd\tGood\n", " TA\tTypical/Average\n", " Fa\tFair\n", " Po\tPoor\n", " NA\tNo Garage\n", "\t\t\n", "PavedDrive: Paved driveway\n", "\n", " Y\tPaved \n", " P\tPartial Pavement\n", " N\tDirt/Gravel\n", "\t\t\n", "WoodDeckSF: Wood deck area in square feet\n", "\n", "OpenPorchSF: Open porch area in square feet\n", "\n", "EnclosedPorch: Enclosed porch area in square feet\n", "\n", "3SsnPorch: Three season porch area in square feet\n", "\n", "ScreenPorch: Screen porch area in square feet\n", "\n", "PoolArea: Pool area in square feet\n", "\n", "PoolQC: Pool quality\n", "\t\t\n", " Ex\tExcellent\n", " Gd\tGood\n", " TA\tAverage/Typical\n", " Fa\tFair\n", " NA\tNo Pool\n", "\t\t\n", "Fence: Fence quality\n", "\t\t\n", " GdPrv\tGood Privacy\n", " MnPrv\tMinimum Privacy\n", " GdWo\tGood Wood\n", " MnWw\tMinimum Wood/Wire\n", " NA\tNo Fence\n", "\t\n", "MiscFeature: Miscellaneous feature not covered in other categories\n", "\t\t\n", " Elev\tElevator\n", " Gar2\t2nd Garage (if not described in garage section)\n", " Othr\tOther\n", " Shed\tShed (over 100 SF)\n", " TenC\tTennis Court\n", " NA\tNone\n", "\t\t\n", "MiscVal: $Value of miscellaneous feature\n", "\n", "MoSold: Month Sold (MM)\n", "\n", "YrSold: Year Sold (YYYY)\n", "\n", "SaleType: Type of sale\n", "\t\t\n", " WD \tWarranty Deed - Conventional\n", " CWD\tWarranty Deed - Cash\n", " VWD\tWarranty Deed - VA Loan\n", " New\tHome just constructed and sold\n", " COD\tCourt Officer Deed/Estate\n", " Con\tContract 15% Down payment regular terms\n", " ConLw\tContract Low Down payment and low interest\n", " ConLI\tContract Low Interest\n", " ConLD\tContract Low Down\n", " Oth\tOther\n", "\t\t\n", "SaleCondition: Condition of sale\n", "\n", " Normal\tNormal Sale\n", " Abnorml\tAbnormal Sale - trade, foreclosure, short sale\n", " AdjLand\tAdjoining Land Purchase\n", " Alloca\tAllocation - two linked properties with separate deeds, typically condo with a garage unit\t\n", " Family\tSale between family members\n", " Partial\tHome was not completed when last assessed (associated with New Homes)\n", "\n", "Downloaded from openml.org.\n" ] } ], "source": [ "from sklearn.datasets import fetch_openml\n", "housing = fetch_openml(name=\"house_prices\", as_frame=True)\n", "print(housing.DESCR)" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdMSSubClassMSZoningLotFrontageLotAreaStreetAlleyLotShapeLandContourUtilities...PoolAreaPoolQCFenceMiscFeatureMiscValMoSoldYrSoldSaleTypeSaleConditionSalePrice
01.060.0RL65.08450.0PaveNoneRegLvlAllPub...0.0NoneNoneNone0.02.02008.0WDNormal208500.0
12.020.0RL80.09600.0PaveNoneRegLvlAllPub...0.0NoneNoneNone0.05.02007.0WDNormal181500.0
23.060.0RL68.011250.0PaveNoneIR1LvlAllPub...0.0NoneNoneNone0.09.02008.0WDNormal223500.0
34.070.0RL60.09550.0PaveNoneIR1LvlAllPub...0.0NoneNoneNone0.02.02006.0WDAbnorml140000.0
45.060.0RL84.014260.0PaveNoneIR1LvlAllPub...0.0NoneNoneNone0.012.02008.0WDNormal250000.0
..................................................................
14551456.060.0RL62.07917.0PaveNoneRegLvlAllPub...0.0NoneNoneNone0.08.02007.0WDNormal175000.0
14561457.020.0RL85.013175.0PaveNoneRegLvlAllPub...0.0NoneMnPrvNone0.02.02010.0WDNormal210000.0
14571458.070.0RL66.09042.0PaveNoneRegLvlAllPub...0.0NoneGdPrvShed2500.05.02010.0WDNormal266500.0
14581459.020.0RL68.09717.0PaveNoneRegLvlAllPub...0.0NoneNoneNone0.04.02010.0WDNormal142125.0
14591460.020.0RL75.09937.0PaveNoneRegLvlAllPub...0.0NoneNoneNone0.06.02008.0WDNormal147500.0
\n", "

1460 rows × 81 columns

\n", "
" ], "text/plain": [ " Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape \\\n", "0 1.0 60.0 RL 65.0 8450.0 Pave None Reg \n", "1 2.0 20.0 RL 80.0 9600.0 Pave None Reg \n", "2 3.0 60.0 RL 68.0 11250.0 Pave None IR1 \n", "3 4.0 70.0 RL 60.0 9550.0 Pave None IR1 \n", "4 5.0 60.0 RL 84.0 14260.0 Pave None IR1 \n", "... ... ... ... ... ... ... ... ... \n", "1455 1456.0 60.0 RL 62.0 7917.0 Pave None Reg \n", "1456 1457.0 20.0 RL 85.0 13175.0 Pave None Reg \n", "1457 1458.0 70.0 RL 66.0 9042.0 Pave None Reg \n", "1458 1459.0 20.0 RL 68.0 9717.0 Pave None Reg \n", "1459 1460.0 20.0 RL 75.0 9937.0 Pave None Reg \n", "\n", " LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal \\\n", "0 Lvl AllPub ... 0.0 None None None 0.0 \n", "1 Lvl AllPub ... 0.0 None None None 0.0 \n", "2 Lvl AllPub ... 0.0 None None None 0.0 \n", "3 Lvl AllPub ... 0.0 None None None 0.0 \n", "4 Lvl AllPub ... 0.0 None None None 0.0 \n", "... ... ... ... ... ... ... ... ... \n", "1455 Lvl AllPub ... 0.0 None None None 0.0 \n", "1456 Lvl AllPub ... 0.0 None MnPrv None 0.0 \n", "1457 Lvl AllPub ... 0.0 None GdPrv Shed 2500.0 \n", "1458 Lvl AllPub ... 0.0 None None None 0.0 \n", "1459 Lvl AllPub ... 0.0 None None None 0.0 \n", "\n", " MoSold YrSold SaleType SaleCondition SalePrice \n", "0 2.0 2008.0 WD Normal 208500.0 \n", "1 5.0 2007.0 WD Normal 181500.0 \n", "2 9.0 2008.0 WD Normal 223500.0 \n", "3 2.0 2006.0 WD Abnorml 140000.0 \n", "4 12.0 2008.0 WD Normal 250000.0 \n", "... ... ... ... ... ... \n", "1455 8.0 2007.0 WD Normal 175000.0 \n", "1456 2.0 2010.0 WD Normal 210000.0 \n", "1457 5.0 2010.0 WD Normal 266500.0 \n", "1458 4.0 2010.0 WD Normal 142125.0 \n", "1459 6.0 2008.0 WD Normal 147500.0 \n", "\n", "[1460 rows x 81 columns]" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "housing.frame" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }