Get the unique set of states along with their log score.
Value
A list with entries:
state: List of unique states.
log_evidence_state: Numeric value representing the evidence calculated from the states.
log_state_score: Vector with the log scores for each state.
log_sampling_prob: Vector with the log of the probability for each state estimated using the MCMC sampling frequency.
Details
This gets the unique set of states in cia_chain(s) referred to as
objects (\(o\)). Then it estimates the probability for each state using two
methods. The log_sampling_prob
is the MCMC sampled frequency estimate for
the posterior probability.
An alternative method to estimate the posterior probability for each state
uses the state score. This is recorded in the log_norm_state_score
. This
approach estimates the log of the normalisation constant assuming
\(\tilde{Z}_O = \Sigma_{s=1}^S p(o_s)p(D | o_s)\) where
\(O = \{o_1, o_2, o_3, ..., o_S\}\) is
the set of unique objects in the chain. This assumes that you have captured the
most probable objects, such that \(\tilde{Z}_O\) is approximately equal to
the true evidence \(Z = \Sigma_{g \in G} p(g)p(D | g)\) where the
sum across all possible DAGs (\(G\)). This also makes the
assumption that the exponential of the score is proportional to the posterior
probability, such that
$$p(g|D) \propto p(g)p(D | g) = \prod_i \exp(\text{score}(X_i, \text{Pa}_g(X_i) | D))$$
where \(\text{Pa}_g(X_i)\) is the parents set for node \(X_i\) given the
graph \(g\).
After the normalisation constant has been estimated we then estimate the log probability of each object as, $$\log(p(o | D)) = \log(p(o)p(D|o)) - \log(\tilde{Z}_o).$$
Preliminary analysis suggests that the sampling frequency approach is more consistent across chains when estimating marginalised edge probabilities, and therefore is our preferred method. However, more work needs to be done here.
Examples
data <- bnlearn::learning.test
dag <- UniformlySampleDAG(colnames(data))
partitioned_nodes <- DAGtoPartition(dag)
scorer <- CreateScorer(
scorer = BNLearnScorer,
data = data
)
results <- SampleChains(100, partitioned_nodes, PartitionMCMC(), scorer)
collection <- CollectUniqueObjects(results)