# python – Pipelines in scikit. Algorithm construction

## Question:

The `pipelines` question from `scikit-learn` . There are `PolynomialFeatures()` , `PCA()` and `LogReg()` . There is a training `x_train, y_train` and a test `x_test, y_test` . Through `x, y` denote `union(x_test, x_train)` and union (y_test, y_train), respectively. I want to do the following trick with my ears:

Throw in `x_poly = PolynomialFeatures(x_train)` . Apply `x_pca = PCA(x_poly)` with dimension reduction. Then `x_union = concatenate(x_train, x_pca, axis=1)` . And classify them through `LogReg()` .

Questions.

• The `pipeline` has a `fit(X, [y])` method. I understand that `[y]` used only if the corresponding algorithm has it. Those. in `PCA.fit()` y will not be used and in `LogReg().fit()` will be used.

• How will the issue with `PolynomialFeatures()` be resolved, since this object's `fit()` method has 2 arguments: `fit(X, y=None)` ?
• What does y stand for here in the documentation?
• In what cases, after all, will y be used in the algorithms present in the pipeline, and in which not?
• I need to combine `x_pca` with `x_train` . How to do this if you cannot use `numpy` directly? If it is possible to use `numpy` , then how?

• Is it possible to use conditions in the `pipeline` . For instance. Having reached a certain stage of the algorithm. Let's say before `PCA()` . I count the variance over the maximum component. If it is `> 0.5` , then I use `logReg()` . If less then I use `SVM()` . Is it possible to implement such functionality?

• In order to understand what is happening in the PCA and PolynomialFeatures, you need to look into the code . `y` is not used there. `y=None` needed so that the signature is common for all `pipeline` methods, and `None` default value.