TrainableAssistedPlayerB¶
Role: Trainable Player B wrapper that consumes Player A "previous" tensors plus local measurements to decide whether to shoot, while tracking the last action log-probability.
Location: Q_Sea_Battle.trainable_assisted_player_b.TrainableAssistedPlayerB
Constructor¶
| Parameter | Type | Description |
|---|---|---|
game_layout |
Any, not specified, shape N/A |
Object providing field_size and comms_size attributes used to derive \(n2 = field\_size^2\) and \(m = comms\_size\). |
model_b |
LinTrainableAssistedModelB, not specified, shape N/A |
Trainable model called as model_b([gun_batch, comm_batch, prev_meas_batch, prev_out_batch]) to produce a shoot logit. |
Preconditions
game_layoutshould exposefield_sizeandcomms_sizeattributes convertible toint; otherwise behavior is not specified (will likely raise at runtime).model_bmust be callable and return a Tensor compatible with shape(1, 1)when invoked fromdecide().
Postconditions
self.game_layoutis set togame_layout.self.model_bis set tomodel_b.self.parentis set toNone.self.last_logprob_shootis set toNone.self.exploreis set toFalse.
Errors
- Not specified by constructor code.
Example
from Q_Sea_Battle.trainable_assisted_player_b import TrainableAssistedPlayerB
# game_layout must provide .field_size and .comms_size; model_b must be a LinTrainableAssistedModelB
player_b = TrainableAssistedPlayerB(game_layout=layout, model_b=model_b)
Public Methods¶
decide(gun, comm, supp=None, explore=None)¶
Decide whether to shoot (0 or 1) based on gun + comm + parent.previous tensors.
Parameters
gun:np.ndarray, dtype int, values in{0,1}, shape(n2,), where \(n2 = field\_size^2\).comm:np.ndarray, dtype not specified, shape(m,), where \(m = comms\_size\); values are validated for shape only (docstring notes ints in{0,1}or floats in[0,1]for DRU).supp:Any | None, ignored, shape N/A.explore:bool | None, if notNoneoverridesself.explore, shape N/A.
Returns
int, constraints{0,1}, shape().
Preconditions
self.parentis notNoneandself.parent.previousis notNone.getattr(self.game_layout, "field_size")andgetattr(self.game_layout, "comms_size")exist and are convertible toint.gun.shape == (n2,)andguncontains only0/1.comm.shape == (m,).self.parent.previousis a tuple-like(prev_meas_list, prev_out_list)where each is alist(enforced before later normalization).len(prev_meas_list) >= 1andlen(prev_out_list) >= 1.
Postconditions
- Computes
shoot_logit = self.model_b([gun_batch, comm_batch, prev_meas_batch, prev_out_batch])wheregun_batchhas shape(1, n2)andcomm_batchhas shape(1, m). - Sets
self.last_logprob_shootto the log-probability (Pythonfloat) of the returned action under the computed logits. - Returns
shootas greedy (shoot_prob >= 0.5) if not exploring, else samples viaUniform(0,1) < shoot_prob.
Errors
ValueError: ifgunshape mismatches(n2,).ValueError: ifguncontains values other than0/1.ValueError: ifcommshape mismatches(m,).RuntimeError: ifself.parent is Noneorself.parent.previous is None.TypeError: ifself.parent.previousis not(list, list)at the initial type check.ValueError: if either list inself.parent.previoushas length< 1.
Example
import numpy as np
# Assume player_b.parent has been set and player_a has already populated parent.previous
n2 = int(player_b.game_layout.field_size) ** 2
m = int(player_b.game_layout.comms_size)
gun = np.zeros((n2,), dtype=int)
comm = np.zeros((m,), dtype=int)
shoot = player_b.decide(gun=gun, comm=comm, explore=True)
logp = player_b.get_log_prob()
get_log_prob()¶
Return log-probability of the last taken shoot action (as set by decide()).
Parameters
- None.
Returns
float, constraints not specified (log-probability), shape().
Preconditions
self.last_logprob_shootis notNone(i.e.,decide()has been called since the lastreset()).
Errors
RuntimeError: ifself.last_logprob_shoot is None.
reset()¶
Reset internal state.
Parameters
- None.
Returns
None, shape N/A.
Postconditions
self.last_logprob_shootis set toNone.
Errors
- Not specified.
Data & State¶
game_layout:Any, constraints not specified, shape N/A; must providefield_sizeandcomms_sizeattributes used bydecide().model_b:LinTrainableAssistedModelB, constraints not specified, shape N/A; called bydecide()to produce a shoot logit tensor.parent:Any | None, constraints not specified, shape N/A; expected (bydecide()) to provide.previouscontaining prior tensors from Player A.last_logprob_shoot:float | None, constraints not specified, shape(); updated bydecide(), cleared byreset().explore:bool, constraints{False, True}, shape(); defaultFalse, optionally overridden per-call viadecide(..., explore=...).
Planned (design-spec)¶
- Not specified (no design notes provided).
Deviations¶
- The docstring claims
parentis of typeTrainableAssistedPlayersand is set byTrainableAssistedPlayers.players(), but the module only types it asAny | Noneand does not define or enforce this contract. parent.previousis initially required to be(list, list)via an explicitisinstance(..., list)check, yet later code contains normalization for non-list/tuple values ("linear case: single tensor → list of length 1") that is unreachable if the earlier check fails; these two behaviors conflict.
Notes for Contributors¶
- Symbols used: \(n2 = field\_size^2\) and \(m = comms\_size\) are derived inside
decide()fromself.game_layout. decide()expectsself.parent.previousto be populated before it is called; ensure Player A executes first in the calling sequence.bernoulli_log_prob_from_logitsmay be imported from.logit_utilsor fall back to a local implementation; changing either affectslast_logprob_shootsemantics.
Related¶
Q_Sea_Battle.trainable_assisted_player_b.bernoulli_log_prob_from_logits(imported if available; otherwise locally defined fallback)Q_Sea_Battle.trainable_assisted_player_b.LinTrainableAssistedModelB(dependency)Q_Sea_Battle.trainable_assisted_player_b.PlayerB(base class, imported if available; otherwise fallback)
Changelog¶
- 0.1: Initial version (per module docstring).