| HOME | HELP | CONTACT US | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
aDepartment of Internal Medicine, Division of Hematology/Oncology, University of Michigan, Ann Arbor, Michigan, USA; bDepartment of Sarcoma Medical Oncology, University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA; cBristol-Myers Squibb, Wallingford, Connecticut, USA
Key Words. Sarcoma • Clinical trials • Response • Imaging
Correspondence: Scott Schuetze, M.D., Ph.D., Department of Internal Medicine, Division of Hematology/Oncology, 1500 E. Medical Center Drive, C409 MIB, Ann Arbor, Michigan 48109-5843, USA. Telephone: 734-936-0453; Fax: 734-747-8792; e-mail: scotschu{at}umich
Received August 20, 2007; accepted for publication October 27, 2007.
Disclosure: R.B. has acted as a consultant to Novartis. R.C. is an employee of Bristol-Myers Squibb and owns stock in Bristol-Myers Squibb and Zimmer. S.S. has acted as a consultant to Sanofi-Aventis. No other potential conflicts of interest were reported by the authors, planners, reviewers, or staff managers of this article.
| ABSTRACT |
|---|
|
|
|---|
2.5 mm, allowing for more reproducible and accurate measurement of smaller lesions. Combination of imaging techniques, such as positron emission tomography with fluorine-18-fluorodeoxyglucose (18FDG-PET) and CT can provide investigators and clinicians with both anatomical and functional information regarding tumors, and there is now a large body of evidence demonstrating the effectiveness of PET/CT and other newer imaging methods for the detection and staging of tumors as well as early determination of responses to therapy. The application of newer imaging methods has the potential to decrease both the sample sizes required for, and duration of, clinical trials by providing an early indication of therapeutic response that is well correlated with clinical outcomes, such as time to tumor progression or overall survival. The results summarized in this review support the conclusion that the RECIST and the WHO criteria for evaluation of response in solid tumors need to be modernized. In addition, there is a current need for prospective trials to compare new response criteria with established endpoints and to validate imaging-based response rates as surrogate endpoints for clinical trials of new agents for sarcoma and other solid tumors.
| INTRODUCTION |
|---|
|
|
|---|
Assessment of new therapies for sarcomas requires agreement on, and consistent use of, endpoints sensitive to the effects of these treatments. Longer survival is the generally accepted gold standard for demonstrating clinical benefit of an oncologic therapy. However, a wide range of surrogate endpoints has been employed as the basis for approving new therapeutic agents [4], and considerable controversy exists regarding which endpoints may be most appropriate for specific tumors [4]. This review article analyzes endpoints in oncology clinical trials, with a focus on sarcomas. This issue is timely, because there has been considerable evolution in approaches for assessment of these tumors and endpoints employed in clinical studies.
| CURRENT ISSUES IN CONSIDERING ENDPOINTS |
|---|
|
|
|---|
Overall Survival
Overall survival (OS) has traditionally been the gold standard as the primary endpoint of phase III trials of cancer therapies [8]. When a randomized trial clearly demonstrates that an experimental drug produces a longer OS time than with standard therapy, approval is likely [9]. A number of advantages are associated with the use of OS as an endpoint for clinical trials of cancer therapies. It is objective and free from ambiguities in interpretation. However, it may be confounded by deaths from causes other than the patient's cancer [8, 10]. The survival difference should not only be statistically significant but also regarded as clinically significant.
Several additional problems may arise with the use of OS as the primary endpoint in an oncology study. The current availability of multiple effective lines of systemic or local therapy, as well as treatment switching, may obscure the impact of the agent under study upon survival in an intent-to-treat analysis. In addition, the interval from the end of recruitment to primary efficacy analysis for OS is protracted, such that subsequent studies taking the best therapy from the previous trial cannot begin until years after the previous trial has completed recruitment. Trials based on OS, which require a minimum of 5 years to complete, are inevitably lengthy and expensive [10, 11] and inhibit drug development, especially in uncommon cancers such as sarcoma.
Time to Progression and Progression-Free Survival
Time to progression (TTP) and progression-free survival (PFS) time (defined as the time from randomization to death or progression, whichever comes first) are also used extensively as endpoints in clinical trials of cancer therapies. These endpoints are similar except that death is included in PFS [8, 10]. Both PFS and TTP are correlated with OS in patients with rectal cancer and can be considered as surrogates for it in this setting [11]. Perhaps the most important advantage associated with the use of PFS and TTP is that they permit smaller sample sizes and shorter study durations. Progression often occurs months to years before death, and differences in the efficacies of new drugs or therapeutic regimens can be detected with shorter follow-up. Another advantage of these endpoints is that TTP and PFS do not require shrinkage of the tumor mass for detection of differences between treatments, which may make them highly suitable for measuring the benefits of cytostatic agents. In addition, disease progression is often the basis for a change in therapy and thus has high applicability to clinical practice [9–11]. Most importantly, TTP or PFS after a single line of therapy is a direct measure of the benefit from that treatment and is not confounded by subsequent events.
However, both PFS and TTP also have disadvantages as surrogate endpoints for OS in clinical trials. The clinical significance of small differences in TTP or PFS may be unclear (as in OS), especially when one is evaluating toxic treatments, and careful assessment of progression at frequent intervals can be costly and labor-intensive. There are also concerns about ascertainment bias in unblinded trials and questions about the reliability of modest differences in TTP or PFS that are often observed in such studies. Because OS is not the primary endpoint in studies employing either PFS or TTP as surrogates, a design to cross over to the investigational treatment from the standard or placebo arm when tumor progression occurs can also be considered. A crossover design has the potential to improve patient enrollment and patient benefit. However, it is also a weakness in that it can dilute the contribution of a survival benefit [10, 11].
Response Rates
Response rates are also used to assess efficacy in trials of cancer therapies. Response measures have been variously defined. However, a complete response (CR) generally indicates tumor disappearance, a partial response (PR) indicates a >50% reduction in the tumor cross product (multiplication of the maximum tumor diameter in the axial plane by the largest diameter and its perpendicular dimension on the same image), and stable disease (SD) indicates a <50% reduction to <25% increase in the cross product [12]. Advantages associated with the use of these measures include the lack of a dilution effect with smaller sample sizes, shorter study durations, and tumor shrinkage, which is clearly and solely dependent on a therapeutic intervention, as spontaneous regressions are quite rare. Disadvantages include potential for bias in unblinded studies, variability in results across studies and centers within an individual trial secondary to differences in criteria and methods of assessment, and the fact that a response to treatment may not necessarily equate with clinical benefit. Response to treatment may or may not be a true surrogate for survival [10].
| THE NEED FOR REDEFINING RESPONSE |
|---|
|
|
|---|
Three major problems with these definitions gradually became apparent with their use in clinical trials [15, 16]. Methods of integrating the change in tumor size into response assessments varied among research groups, minimum lesion size and number of lesions documented varied from one study to the next, and what constituted PD was based on the change in size of a single lesion by some researchers and a change in the overall tumor load (including measurements of all lesions) by others. The advent of new technologies, particularly computed tomography (CT) and magnetic resonance imaging (MRI), further confused matters with respect to the relevance of volumetric and three-dimensional measurements versus bidimensional measures in response assessments. The combination of all these factors resulted in a situation in which response criteria were no longer comparable among research organizations. This was the circumstance that the original WHO publication had aimed to avoid.
Response Evaluation Criteria in Solid Tumors
The Response Evaluation Criteria in Solid Tumors (RECIST) were developed in response to problems with the WHO criteria. The RECIST were published in 2000 and are a simplification of four other methods of assessing solid tumor responses. The RECIST are generally similar to the criteria set forth by the 1979 WHO handbook, with the major change being that the RECIST employ unidimensional measurements of the sum of the longest diameters of tumors in the axial plane instead of the conventional bidimensional WHO method of the product of the longest diameter and that perpendicular to it, summed over all measured tumors. The RECIST response categories are: CR, disappearance of tumor sustained for at least 4 weeks; PR,
30% decrease in tumor sustained for at least 4 weeks; SD, neither PR nor PD criteria met; and PD,
20% increase with no CR, PR, or SD documented before increase of disease [12]. A detailed comparison of the WHO criteria and the RECIST and their associated guidelines is provided in Table 1 [17].
|
The RECIST are predicated on unidimensional and bidimensional measurements being comparable and assume metastases are spherical and change proportionally. Application of the WHO criteria and RECIST to the same patients in 14 studies with a wide range of cancers indicated very similar results for all response categories. Results from this analysis indicated that 91.9% of patients evaluated had the same date of disease progression with the WHO criteria and RECIST; 7.3% had earlier disease progression with the WHO criteria and 0.9% had earlier disease progression with the RECIST (Table 2) [12]. This change is important to PFS since the PFS time by the RECIST will be longer than by the WHO criteria.
|
There are also limitations of the RECIST with respect to determination of disease progression. Response assessment as measured by the RECIST has been shown to have some discrepancies with WHO-determined responses. These appear to occur most often at the PR–SD and SD–PD "borders." This difference may be problematic when new experimental therapies are compared with conventional agents whose response rates have been established in historical trials. The apparent lower rate of disease progression with the RECIST may mean that more patients remain on therapy, and the percentages of patients with SD thus need to be interpreted with caution [25].
The RECIST also ignore the fact that changes in tumor size may not be directly correlated with disease progression in all therapeutic situations. Qualitative changes in tumors (e.g., myxoid degeneration in GIST) may not be reflected in tumor measurements, and this can result in erroneous classification of the response to treatment. Standard anatomic imaging techniques are often inadequate for evaluating malignancies, particularly when monitoring treatment responses for agents that do not cause tumor shrinkage (i.e., cytostatic agents) or for slow-progressing cancers or those malignancies that metastasize diffusely [26]. Thus, morphologic evaluation based solely on one- or two-dimensional measurements may not directly reflect biological changes in tumors associated with either the disease itself or its treatment [27]. Moreover, anatomical changes in the tumor as described by the RECIST may be detected later than functional changes in some circumstances (e.g., in GISTs treated with imatinib) [18]. The use of a primary tumor for response assessment, if the tumor is localized in a hollow organ (e.g., the esophagus), also makes measurements based on the RECIST difficult [18].
Finally, it is important to remember that the RECIST were developed on the basis of discussions carried out in the 1990s and published in 2000. As a result, they do not reflect many advances in imaging technology that have occurred over the past decade. Newer imaging and image-processing modalities may allow changes not considered in the RECIST to be included in revised response criteria [28]. For example, a comparison of relative values of manual unidimensional measurements and automated volumetry with multidetector-row computed tomography (MDCT) for longitudinal treatment response assessment in patients with pulmonary metastases indicated that MDCT provided better reproducibility of response evaluation and should be preferred over manual measurements in these patients [28]. The following section further explores the application of newer imaging technologies in assessing the efficacy of therapies for solid tumors.
| IMAGING-BASED EVALUATION OF RESPONSE TO CANCER THERAPY |
|---|
|
|
|---|
CT
As noted above, the RECIST for evaluating responses to treatment have been criticized because they do not reflect biological changes in solid tumors induced by new targeted therapies and thus may provide misleading results. Modified objective criteria using a combination of tumor size and density on CT have shown promise in early response evaluation and in predicting long-term outcomes in patients with advanced GISTs treated with imatinib. Results from Choi and colleagues indicated that tumor size determined using the sum of the longest dimensions and the RECIST definitions for a significant change were not reliable and underestimated the tumor response to imatinib during the early post-treatment stage in patients with metastatic GISTs. The mean tumor density, however, decreased significantly 2 months after treatment compared with pretreatment values. Moreover, evaluation using a combination of tumor size, tumor density, and absence or presence of tumor nodules and tumor vessels was a better indicator of the tumor response to imatinib than tumor density alone [27].
Choi and associates evaluated a series of 40 patients treated with imatinib for recurrent or metastatic GISTs who had undergone both PET and CT evaluation to determine the CT findings that could differentiate those who had a good response by PET and those who did not [30]. They found that a decrease in tumor size of
10% or a decrease in tumor density of
15% identified 97% of good responders by PET and none of the seven poor responders [30]. They also demonstrated that response defined by these new CT criteria was correlated with longer TTP, whereas response by the RECIST was not [30].
Benjamin and associates confirmed the observations of Choi et al. [30] in a separate group of 58 patients and then evaluated all 98 patients by the RECIST and the Choi criteria. All patients had pretreatment and follow-up CT scans. Disease-specific survival (DSS) and TTP were analyzed by response category. There were 45 (46%) good responders and 53 (54%) poor responders by the RECIST. In contrast, there were 81 (83%) good responders and 17 (17%) poor responders by the Choi criteria [31]. Despite the almost doubling of the response rate when patients were assessed by the Choi criteria versus the RECIST, patients with good responses by the Choi criteria on CT at 8 weeks after the start of treatment had equivalent DSS to that of patients with a CR or PR at any time by the RECIST. In addition, TTP and DSS were significantly correlated with the Choi response group, but not with the response group by the RECIST [31]. These results support the conclusion that the Choi response criteria, which incorporate tumor density and small changes in tumor size on CT, are more sensitive and accurate than the RECIST in assessing the response of GISTs to imatinib treatment. These results have been reproduced at other institutions; however, further validation needs to be completed [32].
Advances in CT technology are likely to further increase its usefulness for the evaluation of cancer therapies. Greater numbers of detectors in CT scanners offer better three-dimensional reconstruction and volumetric measurement [33], but the lack of a sufficient number of centers with appropriate scanners to process data limits the organization of large-scale, multicenter clinical trials. Automated collection and analysis of CT data are vital but not widely available, and manual data collection and analysis are expensive and time-consuming [34]. Another important limitation of CT methods is that heterogeneity of tumors (e.g., hypoxic regions) can confound volumetric measurements [35].
18FDG-PET
18FDG-PET can assess tumor glucose use with high reproducibility. Following therapy, the decrease in glucose uptake correlates with a reduction in viable tumor cells. In contrast to CT, MRI, or ultrasound, PET imaging allows identification of responding and nonresponding tumors early in the course of therapy. PET imaging can easily demonstrate changes in metabolic activity and indicate, sometimes within hours of the first treatment, whether or not a patient will respond to a particular therapy. 18FDG-PET has demonstrated efficacy for monitoring therapeutic response in a wide range of cancers, including breast, esophageal, lung, and head and neck cancers, and lymphoma [36, 37].
Effectiveness of 18FDG-PET
18FDG-PET is useful for determining the responses of GISTs to treatment with imatinib and may be superior to standard anatomic criteria for early evaluation of the responses of GISTs to targeted molecular therapies [38]. As noted above, evaluation using the RECIST may be poorly suited to these tumors because they may have a strong positive response to treatment (e.g., decreased FDG uptake, clinical improvement) without major shrinkage [27].
18FDG-PET scanning has also been shown to be a useful method for prediction of outcomes in patients with high-grade extremity soft tissue sarcomas treated with chemotherapy. Schuetze and colleagues evaluated 46 patients with high-grade localized sarcomas with 18FDG-PET. The maximum standardized uptake value (SUVmax) of tumors was measured before neoadjuvant chemotherapy and again prior to surgery. Resected specimens were examined for residual viable tumors. Patients with a baseline tumor SUVmax
6 and a <40% decrease in 18FDG uptake were at high risk for systemic disease recurrence, estimated to be 90% at 4 years from the time of initial diagnosis. Patients whose tumors had a
40% decline in SUVmax in response to chemotherapy were at a significantly lower risk for recurrent disease and death after complete resection and adjuvant radiotherapy (Fig. 1) [39]. 18FDG-PET results have also been shown to correlate closely with histologic responses of tumors to chemotherapy. Results from 36 patients with osteosarcoma or Ewing's sarcoma–family tumors who received neoadjuvant therapy indicated that a good 18FDG-PET response was concordant with a histologic response in 68%–69% of patients. In addition, a lower SUVmax after neoadjuvant chemotherapy (SUV2) was associated with better long-term outcomes in this small study cohort (the 4-year PFS rate was 72% for SUV2 <2.5 versus 27% for SUV2
2.5; p = .01 for all patients) [40].
|
New positronic substrates will likely expand the utility of PET [44]. The most widely used PET tracer for osteosarcoma is 18FDG. The other clinical PET tracer with reported utility for osteosarcoma imaging is 18F-fluoride ion. 18F-labeled monoclonal antibodies, 18F-fluoromisonidazole, 18F-labeled arginine–glycine–aspartic acid (RGD)-containing glycopeptide, 3H-thymidine, 13N-methionine, and PET of p53 transcriptional activity in osteosarcoma are all being investigated [44].
18FDG-PET Issues
While 18FDG-PET has clearly been demonstrated as an important advance for staging, prognosis, and evaluation of treatment responses in patients with solid tumors, this technique does have significant limitations. These include inability to characterize lesions <1 cm in diameter, difficulty in distinguishing benign post-therapy or unrelated inflammatory responses from the effects of treatment on tumors, and variability in signal acquisition across instruments and in interpretation and analysis among readers [26]. Multi-institutional studies using 18FDG-PET need to consider the potential impact of this variability on overall results. It has also been noted that 18FDG-PET evaluations can be very subjective, and that even SUV is only a semi-quantitative measure [45]. In addition, static 18FDG uptake indices alone may not enable adequate differentiation between benign and malignant lesions. While quantitative dynamic imaging may provide more helpful information, it is more labor-intensive and costly. Another limitation of 18FDG-PET is variable 18FDG uptake in normal structures and sites of inflammation caused by infection or foreign bodies, leading to false-positive results [46].
MRI
Ongoing efforts are developing more powerful methods for automated classification of MRI spectra, based on the acquisition of large datasets of tumor spectra and use of diffusion- and perfusion-weighted imaging. These methods are useful for distinguishing between tumors and abscesses and for predicting responses to radiotherapy, respectively [47].
Magnetic resonance spectroscopy may predict response in a manner analogous to PET [48]. Both of these methods permit imaging of the entire body and combine functional and anatomical information. 18FDG-PET and MRI spectroscopy are valuable techniques for monitoring tumor response in patients undergoing chemo- and radiotherapy, particularly when evaluating early responses. In contrast, MRI is particularly useful for assessing metastasis and infiltration of bone marrow and the central nervous system.
Dynamic contrast-enhanced MRI (DCE-MRI) is a new imaging method for assessing the physiological state of tumor vascularity in vivo. This method uses available imaging techniques and contrast agents and assays the kinetics of tumor enhancement during bolus i.v. contrast administration [49]. DCE-MRI has been shown to be useful for detecting microvascular changes in tumors in response to isolated limb perfusion within 24 hours of treatment in experimental animals [50] and for correctly predicting tumor responses to therapy in a small cohort of 12 patients with histologically proven high-grade soft tissue sarcoma [51]. DCE-MRI correctly predicted tumor response in 8 of 10 evaluable patients. Early rapidly progressive enhancement was correlated histologically with residual viable tumors, and late and gradual, or absence of, enhancement was correlated with necrosis, predominantly centrally located, or granulation tissue [51]. These preliminary results show that DCE-MRI offers the potential for noninvasive monitoring of responses to isolated limb perfusion in soft tissue sarcomas.
| FUTURE DIRECTIONS/CONSENSUS |
|---|
|
|
|---|
| REFERENCES |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | CONTACT US | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| THE ONCOLOGIST | STEM CELLS | CME | ALPHAMED PRESS JOURNALS |