Class BroadcastableClusterInfo

  • All Implemented Interfaces:
    java.io.Serializable, IBroadcastableClusterInfo

    public final class BroadcastableClusterInfo
    extends java.lang.Object
    implements IBroadcastableClusterInfo
    Broadcastable wrapper for single cluster with ZERO transient fields to optimize Spark broadcasting.

    Only essential fields are broadcast; executors reconstruct CassandraClusterInfo to fetch other data from Sidecar.

    Why ZERO transient fields matters:
    Spark's SizeEstimator uses reflection to estimate object sizes before broadcasting. Each transient field forces SizeEstimator to inspect the field's type hierarchy, which is expensive. Logger references are particularly costly due to their deep object graphs (appenders, layouts, contexts). By eliminating ALL transient fields and Logger references, we:

    • Minimize SizeEstimator reflection overhead during broadcast preparation
    • Reduce broadcast variable serialization size
    • Avoid accidental serialization of non-serializable objects
    See Also:
    Serialized Form
    • Method Detail

      • from

        public static BroadcastableClusterInfo from​(@NotNull
                                                    ClusterInfo source,
                                                    @NotNull
                                                    BulkSparkConf conf)
        Creates a BroadcastableCluster from a CassandraClusterInfo by extracting essential fields. Executors will reconstruct CassandraClusterInfo to fetch other data from Sidecar.
        Parameters:
        source - the source ClusterInfo (typically CassandraClusterInfo)
        conf - the BulkSparkConf needed to connect to Sidecar on executors
      • getPartitioner

        public org.apache.cassandra.spark.data.partitioner.Partitioner getPartitioner()
        Specified by:
        getPartitioner in interface IBroadcastableClusterInfo
        Returns:
        the partitioner used by the cluster
      • clusterId

        @Nullable
        public java.lang.String clusterId()
        Description copied from interface: IBroadcastableClusterInfo
        ID string that can uniquely identify a cluster. When writing to a single cluster, this may be null. When in coordinated write mode (writing to multiple clusters), this must return a unique string.
        Specified by:
        clusterId in interface IBroadcastableClusterInfo
        Returns:
        cluster id string, null if absent
      • reconstruct

        public ClusterInfo reconstruct()
        Description copied from interface: IBroadcastableClusterInfo
        Reconstructs a full ClusterInfo instance from this broadcastable data on executors. Each implementation knows how to reconstruct itself into the appropriate ClusterInfo type. This allows adding new broadcastable types without modifying the reconstruction logic in AbstractBulkWriterContext.
        Specified by:
        reconstruct in interface IBroadcastableClusterInfo
        Returns:
        reconstructed ClusterInfo (CassandraClusterInfo or CassandraClusterInfoGroup)