ebook img

DataCamp Scikit Learn Cheat Sheet PDF

1 Pages·2018·0.142 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DataCamp Scikit Learn Cheat Sheet

Python For Data Science Cheat Sheet Create Your Model Evaluate Your Model’s Performance Scikit-Learn Supervised Learning Estimators Classification Metrics Learn Python for data science Interactively at www.DataCamp.com Linear Regression Accuracy Score Estimator score method >>> from sklearn.linear_model import LinearRegression >>> knn.score(X_test, y_test) >>> lr = LinearRegression(normalize=True) >>> from sklearn.metrics import accuracy_score Metric scoring functions Support Vector Machines (SVM) >>> accuracy_score(y_test, y_pred) Scikit-learn Classification Report >>> from sklearn.svm import SVC Scikit-learn is an open source Python library that > >N>a ivsev cB a=y eSsV C(kernel='linear') >>>>>> pfrroimn ts(kclleaarsns.imfiectraitciso nim_proerpto crlta(sys_itfiecsatt,io ny__rperpeodr)t)Panredc sisuiopnp,o rretcall, f1-score implements a range of machine learning, Confusion Matrix >>> from sklearn.naive_bayes import GaussianNB preprocessing, cross-validation and visualization >>> gnb = GaussianNB() >>> from sklearn.metrics import confusion_matrix >>> print(confusion_matrix(y_test, y_pred)) algorithms using a unified interface. KNN Regression Metrics >>> from sklearn import neighbors A Basic Example >>> knn = neighbors.KNeighborsClassifier(n_neighbors=5) Mean Absolute Error >>> from sklearn import neighbors, datasets, preprocessing Unsupervised Learning Estimators >>> from sklearn.model_selection import train_test_split >>> from sklearn.metrics import mean_absolute_error >>> from sklearn.metrics import accuracy_score Principal Component Analysis (PCA) >>> y_true = [3, -0.5, 2] >>> mean_absolute_error(y_true, y_pred) >>> iris = datasets.load_iris() >>> X, y = iris.data[:, :2], iris.target >>> from sklearn.decomposition import PCA Mean Squared Error >>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33) >>> pca = PCA(n_components=0.95) >>> from sklearn.metrics import mean_squared_error >>> scaler = preprocessing.StandardScaler().fit(X_train) K Means >>> mean_squared_error(y_test, y_pred) >>> X_train = scaler.transform(X_train) R² Score >>> from sklearn.cluster import KMeans >>> X_test = scaler.transform(X_test) >>> knn = neighbors.KNeighborsClassifier(n_neighbors=5) >>> k_means = KMeans(n_clusters=3, random_state=0) >>> from sklearn.metrics import r2_score >>> r2_score(y_true, y_pred) >>> knn.fit(X_train, y_train) >>> y_pred = knn.predict(X_test) Model Fitting Clustering Metrics >>> accuracy_score(y_test, y_pred) Adjusted Rand Index Supervised learning Loading The Data Also see NumPy & Pandas Fit the model to the data >>> from sklearn.metrics import adjusted_rand_score >>> lr.fit(X, y) >>> adjusted_rand_score(y_true, y_pred) >>> knn.fit(X_train, y_train) Homogeneity Your data needs to be numeric and stored as NumPy arrays or SciPy sparse >>> svc.fit(X_train, y_train) matrices. Other types that are convertible to numeric arrays, such as Pandas Unsupervised Learning >>> from sklearn.metrics import homogeneity_score Fit the model to the data >>> homogeneity_score(y_true, y_pred) DataFrame, are also acceptable. >>> k_means.fit(X_train) V-measure >>> pca_model = pca.fit_transform(X_train) Fit to data, then transform it >>> import numpy as np >>> from sklearn.metrics import v_measure_score >>> X = np.random.random((10,5)) >>> metrics.v_measure_score(y_true, y_pred) >>> y = np.array(['M','M','F','F','M','F','M','M','F','F','F']) Prediction >>> X[X < 0.7] = 0 Cross-Validation Supervised Estimators >>> from sklearn.cross_validation import cross_val_score Training And Test Data Predict labels >>> print(cross_val_score(knn, X_train, y_train, cv=4)) >>> y_pred = svc.predict(np.random.random((2,5))) Predict labels >>> print(cross_val_score(lr, X, y, cv=2)) >>> y_pred = lr.predict(X_test) >>> from sklearn.model_selection import train_test_split >>> y_pred = knn.predict_proba(X_test) Estimate probability of a label Tune Your Model >>> X_train, X_test, y_train, y_test = train_test_split(X, Unsupervised Estimators y, Predict labels in clustering algos Grid Search random_state=0) >>> y_pred = k_means.predict(X_test) >>> from sklearn.grid_search import GridSearchCV Preprocessing The Data >>> params = {"n_neighbors": np.arange(1,3), "metric": ["euclidean", "cityblock"]} >>> grid = GridSearchCV(estimator=knn, Standardization Encoding Categorical Features param_grid=params) >>> grid.fit(X_train, y_train) >>> from sklearn.preprocessing import StandardScaler >>> from sklearn.preprocessing import LabelEncoder >>> print(grid.best_score_) >>> scaler = StandardScaler().fit(X_train) >>> enc = LabelEncoder() >>> print(grid.best_estimator_.n_neighbors) >>> standardized_X = scaler.transform(X_train) >>> y = enc.fit_transform(y) Randomized Parameter Optimization >>> standardized_X_test = scaler.transform(X_test) Normalization Imputing Missing Values >>> from sklearn.grid_search import RandomizedSearchCV >>> params = {"n_neighbors": range(1,5), >>> from sklearn.preprocessing import Normalizer >>> from sklearn.preprocessing import Imputer > > > r s e a r c h = R"awnediogmhitzse"d:S e[a"rucnhiCfVo(rems"t,i m"adtiosrt=aknncne," ]} >>> scaler = Normalizer().fit(X_train) >>> imp = Imputer(missing_values=0, strategy='mean', axis=0) param_distributions=params, >>> normalized_X = scaler.transform(X_train) >>> imp.fit_transform(X_train) cv=4, >>> normalized_X_test = scaler.transform(X_test) n_iter=8, Binarization Generating Polynomial Features random_state=5) >>> rsearch.fit(X_train, y_train) >>> print(rsearch.best_score_) >>> from sklearn.preprocessing import Binarizer >>> from sklearn.preprocessing import PolynomialFeatures >>> binarizer = Binarizer(threshold=0.0).fit(X) >>> poly = PolynomialFeatures(5) DataCamp >>> binary_X = binarizer.transform(X) >>> poly.fit_transform(X) Learn Python for Data Science Interactively

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.