RFMO-02 - Rapid fire session from selected oral abstracts


Exploring The Use Of Machine Learning To Categorise Online Forum Posts Relating To Self-medication By International Travellers

  • By: PRATAMA, Antonius Nugraha Widhi (Sydney Pharmacy School, Faculty Of Medicine And Health, The University Of Sydney, Australia)
  • Co-author(s): Mr Antonius Nugraha Widhi Pratama (Sydney Pharmacy School, Faculty of Medicine And Health, The University of Sydney, Camperdown, Australia)
    Dr Ardalan Mirzaei
    Dr Jack Collins
    Associate Professor Brahmaputra Marjadi
    Prof Rebekah Moles
    Dr Carl Schneider
  • Abstract:

    Background information
    International travellers commonly engage in self-medication and often use the internet to seek initial health advice on recognising symptoms or choosing products to prevent, treat or manage a health condition. Online discussion fora are among those internet resources with the main features of focusing on specific topics and allowing a higher degree of anonymity compared to other social networking platforms. Self-medication advice on online forum platforms may not be given by people with relevant qualifications, which may lead to inappropriate practice. Pharmacists have a role in promoting responsible self-medication but monitoring online conversations is challenging due to the volume, speed and variety of posts. The utilisation of machine learning technology may address this challenge.

    To describe the development and performance of three machine learning models to classify whether an online forum post refers to self-medication.

    We used Indonesia-related fora as part of a larger study on self-medication among travellers in Indonesia. Text data were curated from four online fora: Balipod, Living in Indonesia Forum, Expat Indo Forum and TripAdvisor forum for ‘Indonesia’ topics. These fora were free to access and the most active fora used by international travellers visiting or planning to visit Indonesia from 7 November 2002 to 15 December 2022. A 1% sample (n=1180 posts) of retrieved posts was randomly selected for labelling. Two pharmacist reviewers independently labelled (‘yes’ or ‘no’) each post relating to self-medication. Three machine learning models were developed with different architectures. Data were cleaned for noise, such as unrecognised characters, before applying a 60:20:20 train-validation-test split. Word embeddings, or grouping words close in meaning, were applied to the data. Each model was evaluated for performance before and after word embeddings using the Receiver Operating Characteristic's Area Under the Curve (AUC-ROC).

    Cohen’s kappa score was 62.2 for data labelling. When data from both raters were merged, a deletion of 71 posts occurred. Of the remaining 1109 posts, 6.04% were labelled as self-medication. As expected, the performance of all models improved after the word embeddings. The best model produced an average AUC-ROC score of 0.79 (SD ±0.05), considered as acceptable performance.

    This work demonstrates that machine learning technology can help classify whether an online forum post pertains to self-medication. The use of machine learning can potentially enable pharmacists to respond to such posts, thereby affording the opportunity for moderation or intervention.