Abstract:
Objective This study addresses the difficulty of directly using raw monitoring data for AI training in prognostics and health management (PHM) of marine equipment.
Methods An AI-ready time-series sample repository is constructed through a sample information model integrating multi-level annotation, full provenance, and multidimensional quality description; a three-level quality assessment framework spanning channel, sample, and dataset; task-oriented learning strategies for scarce labels, noisy labels, temporal misalignment, and class imbalance; and an edge-cloud collaborative workflow.
Results In a ship main-engine bearing fault diagnosis case, the proposed model organized multi-source heterogeneous data spanning five orders of magnitude in sampling rate. The Cohen’s Kappa coefficient between automatic quality assessment and expert review was 0.886. The performance retention on degraded datasets was 92.1%. The edge-cloud architecture reduced average daily bandwidth consumption by 98.7%, and the edge-side inference latency was 78 ms.
Conclusion This study establishes an AI-ready time-series sample repository framework for marine-equipment PHM and provides a practical reference for data organization, quality governance, and service implementation in related scenarios.