Question: Question:
I have a question about graph theory.
I would like to consider a method / expression for keeping the positional relationship between vertices (nodes) invariant.
As a concrete example, consider the molecule of a compound.
Here, with the atoms (vertices in the graph) in the molecule,
The three-dimensional coordinates (three-dimensional vector) of each atom are given.
Using these three-dimensional coordinates, you can calculate the distance between vertices (the length of sides in the graph).
Here, when viewed from a certain atom (referred to as the i-th atom) a_i in the compound,
I would like to consider a method / expression that keeps the positional relationship of other atoms a_j, a_k, a_m … in three-dimensional space invariant with respect to operations such as rotation.
For example, if you calculate the distance between atoms using the given 3D coordinates,
Although the distance relationship of other atoms seen from a_i can be expressed,
Then, the information of the positional relationship in the three-dimensional space will be lost.
I have a set of all triangles made up of a_i and any other two atoms a_j and a_k,
I thought that the positional relationship of other atoms in the three-dimensional space when viewed from a_i would be kept invariant.
That is, not the distance d_ij between the two atoms
If the distance between the three atoms (d_ij, d_ik, d_jk) = triangle, then
It means that we can maintain the positional relationship in the three-dimensional space.
(Of course, here, we distinguish and define the triangle by considering the type of atom that is the apex of the triangle.)
Then calculate this triangle for all atoms a_i.
But I'm not sure if this is the case.
Especially when implementing it programmatically
Since it is necessary to distinguish the enantiomers (need to be invariant to rotation),
I'm a little unsure.
Who would you like to teach?
postscript
For the purpose of doing this, I would like to use machine learning to predict the toxicity of compounds after expressing the positional relationship of atoms with a data structure that preserves some properties. vinegar. Machine learning is, after all, a function approximation, so we need to give the same force to data that have essentially the same meaning. However, if you enter the coordinates as they are, the data that have essentially the same meaning (positional relationship of the elements) will be entered as different data, so there is a problem consciousness there.
Answer: Answer:
Even if the questioner implements the logic he is thinking about, I think that the problem of the combination of n! "Which coordinates correspond to which coordinates" will not be solved. (Maybe I just don't understand enough, but …)
And since I couldn't think of a good general solution, I would like to describe the policy of doing this myself.
I understand that what I want to do now is, given a set of 3D coordinates, whether it rotates and matches.
First: Think in terms of relative coordinates from the center of gravity
Since we want a reference point, we calculate the average (center of gravity) of each coordinate set and calculate the relative position of each coordinate. Now you just need to determine if this set of relative coordinates matches when you perform a rotating coordinate transformation.
Determine the rotation of the coordinates
It is difficult to do as it is, so I will try to decide from some characteristic value of the target. Perhaps the invariant for the questioner's rotation is saying this.
For example, isn't the target coordinate set (compound) linear and the position of the coordinate (atom) farthest from the center of gravity uniquely determined? If you have such a characteristic, you can use it.
Also, you said that you should consider the type of atom, but do you know the characteristic atoms for the compound? (For example: only one specific atom is used)
Or, if you look at the coordinate set as a whole, you can't derive the feature points, but if you consider the coordinate set for only a specific atom type, the feature points may be easier to understand.
If you know the conditions of the compound you want to judge, calculate two feature points in consideration of the above points. If there are two feature points, the coordinate rotation in three dimensions can be determined together with the center of gravity.
Determine identity after rotation
Since the coordinates are floating point numbers and real data, they do not match exactly, and I think that an error will always occur. So, compare the distances while finding the corresponding coordinates with the following logic.
For the tolerance e, divide the coordinates of the compound by the grid of e, and prepare an associative array (hereinafter, Hash) of {3次元座標 -> 対応座標候補配列}
.
For one of the compounds to be compared, do the following:
各座標 a_i に対応する格子に対して:
格子自身と、それに隣接する格子 3*3*3 = 27 個それぞれに対して、
Hash[格子] << a_i # Hash に a_i を追加する
Then for the other compound:
各座標 a_j に対応する格子に対して:
格子自身と、それに隣接する格子 3*3*3 = 27 個に対して、
Hash の中の a_i 座標からから、
a_j と一番近い座標 a_i を見つけ出す。
Then, find the distance between a_i and a_j, and if it is less than or equal to the error e, it is considered that the coordinates match. If you take the sum of the distances of all the corresponding coordinates, you can use it as a guide to how well the coordinate sets match.
in fact
In particular, it may be necessary to try and error the accuracy of rotation. In that case, fine-tune how many times to run the above matching algorithm and find a rotation that reduces the sum of distances. If the total distance is below a certain level, it should be considered to match, but I think that we have to find out what this value is by trial and error.