Collect unique objects — CollectUniqueObjects • cia

Get the unique set of states along with their log score.

Usage

CollectUniqueObjects(x)

Arguments

x: A cia_chains or cia_chain object.

Value

A list with entries:

state: List of unique states.
log_evidence_state: Numeric value representing the evidence calculated from the states.
log_state_score: Vector with the log scores for each state.
log_sampling_prob: Vector with the log of the probability for each state estimated using the MCMC sampling frequency.

Details

This gets the unique set of states in cia_chain(s) referred to as objects ($o$). Then it estimates the probability for each state using two methods. The log_sampling_prob is the MCMC sampled frequency estimate for the posterior probability.

An alternative method to estimate the posterior probability for each state uses the state score. This is recorded in the log_norm_state_score. This approach estimates the log of the normalisation constant assuming $\tilde{Z}_O = \Sigma_{s=1}^S p(o_s)p(D | o_s)$ where $O = \{o_1, o_2, o_3, ..., o_S\}$ is the set of unique objects in the chain. This assumes that you have captured the most probable objects, such that $\tilde{Z}_O$ is approximately equal to the true evidence $Z = \Sigma_{g \in G} p(g)p(D | g)$ where the sum across all possible DAGs ($G$). This also makes the assumption that the exponential of the score is proportional to the posterior probability, such that $$p(g|D) \propto p(g)p(D | g) = \prod_i \exp(\text{score}(X_i, \text{Pa}_g(X_i) | D))$$ where $\text{Pa}_g(X_i)$ is the parents set for node $X_i$ given the graph $g$.

After the normalisation constant has been estimated we then estimate the log probability of each object as, $$\log(p(o | D)) = \log(p(o)p(D|o)) - \log(\tilde{Z}_o).$$

Preliminary analysis suggests that the sampling frequency approach is more consistent across chains when estimating marginalised edge probabilities, and therefore is our preferred method. However, more work needs to be done here.

Examples

data <- bnlearn::learning.test

scorer <- CreateScorer(
  scorer = BNLearnScorer, 
  data = data
  )
init_state <- InitPartition(colnames(data), scorer)

results <- SampleChains(100, init_state, PartitionMCMC(), scorer)
collection <- CollectUniqueObjects(results)