Because of these results, 1 can now derive a z score for every single motif and for that reason rank them as outlined by their exceptionality. We then worked on modelling the total distribution from the count of a coloured motif in an ER random graph model. To this goal, we performed a big number of simulations, working with dierent colour frequencies for the motif and dierent variety of vertices and edges for the graph. We could establish that the Poisson distribution was not appropriate whereas the Polya Aeppli distribution was a very good and far better approximation than the frequently utilised Gaussian distribution. The selection of a Polya Aeppli distribution was driven by the following information, motif occurrences overlap in a network, as shown in Figure 1, compound Poisson distributions are particularly adapted to model counts of clumping events, Polya Aeppli approximations are ecient for the count of words in letter sequences.
These outcomes can in turn be used to derive a P value for every motif, and, for that reason, to introduce a reduce o for deciding which motifs really should be chosen for downstream analysis. To our understanding, there has been no previous operate around the signicance of coloured ML167 motifs in random graphs. This can be the explanation why we started by focusing around the extra basic random graph model which is readily available. We are aware that this might not be by far the most suitable model to describe the structure Coloured Random Graph Model. We look at a random graph G with n vertices V1, Vn. We assume that random edges are independent and distributed according to a Bernoulli distribution with parameter p 0, 1.
In addition, vertices are randomly and independently coloured as follows. Let C be a nite set of 17DMAG r dierent colours and f a probability measure on C, f is then the probability for a vertex to be coloured with c C. Within a metabolic network, the colours of reaction vertices can represent classes of chemical transforma tions, in regulation networks, the colours of gene ver tices can represent functional classes. For dening these classes, the EC number hierarchy is classically applied. Coloured Motif. We take into account motifs as introduced in Lacroix et al, a motif m of size k is usually a multiset of k colours m1, mk Ck. Colours from a motif might not be dierent, that is, a single may perhaps have mi mj for some 1 i, j k. We then denote by sm the multiplicity of the colour c in m. When there is absolutely no ambiguity, sm will simply be denoted by Figure two, Instance of a graph in addition to a motif. The motif m happens three occasions within the graph, at positions s. The notion of multiplicity of a single colour in m will be extended to a multiset of colours in Section 3. two. Motif Occurrences. We now dene an occurrence of such a coloured motif. To this goal, we introduce the following notation.