A Cache-Aware Offloading Strategy for Timely Generative AI Services in IIoT Networks
In this paper, the inference freshness of generative artificial intelligence (gen-AI) services in industrial Internetof- Things (IIoT) networks is investigated. A freshness metric termed the peak age of inference (PAoIF) is proposed to quantify inference freshness by accounting for peak age of information and delays due to transmitting inference requests and results. A cache-aware offloading (CAO) strategy which employs multi-access edge computing in IIoT networks is also proposed for timely inference delivery. Leveraging novel closedform expressions for PAoIF violation probability within the proposed CAO and benchmark strategies for IIoT, this study analyzes the impact of PAoIF violation age, gen-AI service request rate, and average transmission rate, on inference freshness. We identified scenarios where the proposed CAO strategy exhibits a lower PAoIF violation probability compared to the benchmark strategies under consideration. Furthermore, we show that PAoIF violation probability under the proposed CAO strategy is minimized via optimizing the average transmission rate in IIoT networks employing servers with limited computing resources. Therefore, the analysis shows that the proposed CAO strategy is a viable technique towards enabling inference freshness for gen-AI services in IIoT networks.