Leveraging advanced natural language processing to enhance insights into the cancer patient journey
Apr 3rd, 2025

Understanding the complexities of the cancer patient journey is paramount for pharmaceutical companies seeking to optimize patient support, refine educational resources, and enhance engagement strategies. Traditional methodologies, such as claims data analysis and patient surveys, often provide an incomplete and biased perspective due to their structured nature.
Cancer patients and caregivers frequently engage in online discourse to share experiences, seek guidance, and articulate concerns. These digital interactions form a vast repository of real-world patient insights that remain largely untapped due to limitations in traditional research methodologies.
Claims data, while valuable for understanding treatment transitions, lacks insight into the underlying rationale behind patient decisions. Similarly, structured surveys introduce biases by constraining responses to predefined questions, failing to capture the spontaneity and depth of patient sentiment.
To address these challenges, we employed state-of-the-art natural language processing methodologies to extract high-resolution insights from online patient narratives, ensuring a more authentic and comprehensive representation of the patient journey.
This study explored the application of cutting-edge NLP techniques, including transformer-based architectures, deep contextual embeddings, and knowledge graph construction, to analyze unstructured, patient-generated content from online communities. Our approach provides a more comprehensive, unbiased, and dynamic understanding of the challenges faced by cancer patients and caregivers.
Our findings were recently presented at Pharma SOS 2025 in the session titled “Advancing healthcare and pharmaceutical insights generation: Groundbreaking solutions using NLP.” The discussion sparked valuable conversations on how pharmaceutical companies can leverage these insights to develop more effective, patient-centric strategies.
Below, we outline how our approach shed light on the needs of patients and caregivers and how these insights can guide pharma in enhancing engagement strategies.
Leveraging NLP for unbiased, authentic patient perspectives
A pharmaceutical client sought to gain a comprehensive understanding of the disease journey for rare cancer patients from the perspective of both patients and caregivers. To achieve this, they required a scalable, unbiased solution capable of identifying key challenges in navigating the healthcare system, determining areas where additional support could drive meaningful impact, and generating insights to inform commercial, medical affairs, and field sales and marketing strategies.
We addressed this need by implementing an advanced NLP-driven approach to analyze patient-generated content across various digital platforms, including blogs, videos, and patient forums. Our analysis encompassed over 5.5 million words from 50,000 posts across 30 publicly available sources, capturing data on approximately 1,200 patients. Strict ethical guidelines governed our data collection process, ensuring that only openly accessible content was utilized in accordance with the platform’s terms of service.
In contrast to traditional surveys that rely on structured, pre-defined questions, our methodology enabled the extraction of unbiased, spontaneous patient narratives, providing an authentic, data-driven representation of the patient journey without external influence.
Transforming patient conversations into actionable insights with NLP
To extract meaningful insights from unstructured patient narratives, we employed an advanced NLP pipeline designed to move beyond simple keyword matching and into context-aware, deep linguistic analysis. Our approach incorporated several cutting-edge methodologies to ensure a nuanced understanding of patient concerns, emotional states, and emerging trends.
Contextual understanding beyond keywords
Rather than relying on traditional text-mining techniques, we leveraged transformer-based models such as BERT, GPT, and BioBERT, which enabled us to capture contextual meanings, sentiment shifts, and implicit themes within patient discussions. This distinction was critical, as words like marker could refer to biomarkers, tumor markers, or even physical symptoms, depending on context.
Entity recognition and relationship extraction
We utilized named entity recognition (NER), enhanced with domain-specific embeddings, to standardize mentions of diseases, treatment regimens, side effects, and diagnostic procedures. Beyond entity recognition, we applied dependency parsing and relationship extraction models to map how different aspects of the patient journey interact—for example, linking a chemotherapy regimen with specific side effects and corresponding patient emotions.
Sentiment and emotion detection with explainability
To quantify the emotional landscape of patient discussions, we implemented a multi-layered sentiment analysis framework, integrating transformer-based classifiers and affective computing models. Unlike basic sentiment detection, this approach:
- Distinguished between primary and secondary emotions (e.g., frustration due to treatment delays vs. fear of disease progression).
- Provided explainability metrics, justifying why a post was classified under a particular emotion, ensuring transparency in interpretation.
Temporal and thematic analysis of patient concerns
We applied topic modeling techniques (latent dirichlet allocation and dynamic embeddings) to cluster conversations into thematic groups, tracking how discussions evolved over time. By combining recurrent neural networks (RNNs) and attention-based architectures, we also modeled longitudinal trends in patient concerns, identifying shifts in perception regarding therapies, support systems, and healthcare accessibility.
By implementing this NLP-driven approach, we transformed fragmented patient conversations into a structured, data-rich landscape, enabling pharmaceutical companies to extract high-fidelity, real-world insights that inform strategic decision-making, optimize patient engagement, and enhance support programs.
Uncovering core issues from patient and caregiver conversations
Our analysis identified key thematic clusters encapsulating the multidimensional challenges faced by patients and caregivers, spanning emotional, physical, and logistical domains. These insights highlight the systemic barriers impeding access to adequate care, ranging from concerns about treatment efficacy and adverse effects to navigational complexities within the healthcare ecosystem.
By leveraging unsupervised topic modeling and sentiment analysis, we quantified the most pressing patient and caregiver concerns, revealing critical pain points that often go unaddressed in traditional surveys and claims data analyses. The table below presents the primary subthemes of frustration, providing a structured view of the predominant challenges influencing the patient journey.
Subthemes around the frustration expressed by patients and their caregivers
Theme | Percentage of posts |
Treatment modalities | 31% |
Hematological management | 18% |
Navigating the healthcare ecosystem | 15% |
Constraints in knowledge and information | 14% |
Obstacles in support and communication | 11% |
Personal toll and symptomatic burden | 7% |
Building upon these thematic insights, we conducted a granular subtheme analysis to uncover the specific pain points patients and caregivers frequently encounter. Within treatment-related frustrations, several critical concerns emerged. Patients frequently expressed uncertainty regarding drug effectiveness, with many highlighting persistent or worsening symptoms despite prolonged treatment. Concerns over debilitating side effects, including fatigue, nausea, and cognitive impairment, were commonly discussed. Many patients explored experimental therapies, integrative medicine, or off-label drug use, often encountering limited access, lack of physician support, and conflicting information.
Delays in therapy initiation were another major frustration, with insurance approvals, administrative bottlenecks, and provider scheduling conflicts contributing to significant treatment delays. Patients also voiced concerns over the restricted range of available therapies, particularly in rare cancers, which limited personalized treatment options and often forced reliance on outdated regimens. Continued dependence on blood transfusions and palliative interventions due to disease progression further underscored the lack of viable alternatives, with many patients expressing dissatisfaction with their quality of life.
Additionally, inadequate pain and symptom management emerged as a critical issue. Many patients reported undertreated pain, difficulty accessing appropriate palliative care, and inconsistent symptom control, highlighting gaps in holistic treatment approaches. These insights provide a high-resolution view of patient struggles, offering pharmaceutical stakeholders targeted areas where intervention and support strategies could be optimized.
Where can pharmaceutical companies make a meaningful impact
These insights present a significant opportunity for pharmaceutical companies to address patient needs holistically across the entire disease journey. While our analysis does not dictate specific interventions, it identifies key areas where pharma can drive meaningful impact, from enhancing healthcare navigation to bridging critical knowledge gaps that often leave patients and caregivers struggling for clarity.
One avenue for improvement lies in strengthening collaboration between manufacturers and healthcare providers to streamline treatment access. By fostering clearer pathways, pharma can support initiatives that reduce delays in care, improve coordination between specialists, and simplify the often-complex insurance approval process. Patient advocacy programs could also take a more active role in alleviating logistical and financial challenges by providing transparent information on insurance coverage, treatment center options, and available financial assistance programs, thereby easing the burden on patients and caregivers.
Equally crucial is the need to combat misinformation and ensure that patients have access to reliable, comprehensible, and actionable medical information. Given the widespread difficulty in finding trustworthy resources, pharma could develop tailored educational content on treatment options, side effect management, and disease progression.
A particularly effective approach could involve implementing virtual patient assistance systems—AI-powered platforms capable of delivering real-time, evidence-based responses to patient inquiries. By offering personalized, on-demand support, these systems could help patients navigate their treatment journey with greater confidence and ensure they are making informed decisions based on accurate and up-to-date information.
How NLP can transform patient engagement in pharma
Our analysis not only highlighted the frustrations of patients and caregivers but also uncovered deeper emotional experiences, critical information gaps, and unmet needs—areas that traditional engagement surveys often overlook. By leveraging advanced language models, we extracted valuable insights that can drive more patient-centered initiatives, from streamlining care pathways to enhancing support systems and refining commercial strategies.
Presenting these findings at Pharma SOS was an opportunity to spark important discussions on how the industry can transform these insights into meaningful action. As the conversation continues, one thing remains clear: for companies striving to make a real impact, the first step is to truly listen to what patients are saying.
Enhance patient insights with Definitive Healthcare
Discover how Definitive Healthcare can help pharma companies gain a deeper understanding of the patient journey. With our comprehensive data and insights, you can make more informed decisions, enhance patient support, and refine your commercial strategies. Start your journey today—sign up for a free trial and experience firsthand how our platform empowers you with the insights you need to fully understand the patient journey.