ir.cs.georgetown.edu

Ir.cs.georgetown.edu

Learning the Relationships between Drug, Symptom, and Medical Condition Mentions in Social Media Information Retrieval Lab Information Retrieval Lab Information Retrieval Lab Department of Computer Science Department of Computer Science Department of Computer Science Georgetown University Georgetown University Georgetown University (i.e., healthy, exposed, or infected) over time based on theusers' tweets. Paul and Dredze (2011) propose the Ailment We consider the general problem of learning relationships be- Topic Aspect Model (ATAM+) to associate treatment, symp- tween drugs, symptoms, and medical conditions mentionedon Twitter, with the goal of estimating probability distribu- tom, and general terms with latent ailment topics. While our tions to reduce the difficulties presented by social media's in- goal of learning the associations between drug, symptom, complete picture. If a user mentions taking a drug and ex- and condition mentions is similar to modeling health-related periencing several unexpected symptoms, for example, are topics, we differ in that we do not try to discover latent the symptoms associated with that drug or is it more likely topics. We use concept extraction and medical thesauri to that the symptoms are associated with an unmentioned un- identify mentions rather than training a topic model to dis- derlying condition? We describe a model for learning from cover topics (i.e., term categories). We envision our model as and utilizing such relationships. We demonstrate that our ap- augmenting health-related mining tasks such as discovering proach identifies drugs that are similar based on their associ- drug side effects or estimating the prevalence of a disease.
ated symptoms (or conditions), identifies conditions that aresimilar based on their associated symptoms, and can deter-mine whether a symptom is caused by a medical condition or by a drug (i.e., a drug side effect).
We model relationships between symptoms, drugs, andmedical condition mentions in tweets as a Bayesian net- work. A user's medical conditions determine what drugs the Social media data are subject to many biases, and sampled user takes. The user's conditions and drugs determine what data sources such as Twitter's streaming API compound the symptoms the user experiences; a symptom may be caused problem. There is no guarantee that users will mention all by the condition or it may be a side effect of a drug the user details that are relevant to the mining task being performed, is taking. We compute a joint probability distribution over and even when users do provide complete information, the symptoms, drugs, and conditions and use it to compute con- public Twitter API's sampling may prevent all of the tweets ditional probability distributions among them. We identify containing relevant information from being collected.
symptoms, drugs, and conditions in tweets using the CRF We reduce the difficulty of mining health-related data method described in (Yates, Goharian, and Frieder 2015).
with incomplete information by modeling the relationships Let S be a random variable over all symptoms, D be a between drugs, symptoms, and medical conditions. If a user random variable over all drugs, and C be a random vari- mentions suffering from a medical condition and experienc- able over all conditions. Let CountS,D,C;U be the number ing a symptom, for example, the symptom may be caused by of times S, D, and C are mentioned by the Twitter user U the medical condition or it may be caused by an umentioned during a d-day window. We use nil values in cases where drug the user is taking. Our model addresses this problem by a d-day window contains only one or two types of random assessing the probability that a user who mentioned a con- variable; that is, if a user mentions one or more symptoms or dition and symptom is also associated with an unmentioned conditions during a window but does not mention any drugs, drug. Similarly, we demonstrate how our model estimates we set D = ∅ for that window. The joint probability mass the similarity between drugs, symptoms, and conditions and determines whether symptom mentions are more likely to beassociated with a condition mention or with a drug mention (i.e., drug side effects).
Many have considered the problem of modeling health- related latent topics in social media. Chen et al. (2014) use Conditional probabilities between any two of the random a temporal topic model to model users' flu infection statuses variables are then computed by marginalizing out the third 2016, Association for the Advancement of Artificial variable. The conditional probability of extracting the symp- Intelligence (www.aaai.org). All rights reserved.
tom S given a drug D, for example, is: Aleve headache (0.10), sleepy (0.06), confused (0.06), gnecomastia (0.04), cramps (0.04), throw up (0.04), feel sick (0.03) , ache (0.03), fever (0.03) & cough (0.02) Aspirin nasal polyps (0.08), polyps (0.08), tumor (0.05), Conditional probability distributions can be inspected to swelling (0.05), swollen (0.04), headache (0.03), salivary identify associations between symptoms, drugs, and condi- glands (0.03), runny nose (0.03), fever & apoptosis (0.02) tions. The Kullback–Leibler divergence between distribu- Tylenol asthma (0.06), headache (0.06), fever (0.04), con- tions can be used to compare the similarity of random vari- fused (0.04), hard of hearing, throw up, hearing loss, tu- ables, such as comparing the similarity of symptoms associ- mor, sleepy & migraine (0.03) ated with two drugs D1 and D2: Symptoms that NSAIDs are commonly taken to relieve,such as headaches and fevers, are associated with every DKL(Pr(S D1) Pr(S D2)) drug. Symptoms of underlying conditions that NSAIDs do Finally, we can identify symptoms that are either more not cause or treat also appear, however, such as cough, highly associated with a condition than a drug (i.e., are sleepy, and runny nose. This illustrates that while condi- symptoms of the condition) or more highly associated with tional probability can be used to find associations between a drug than a condition (i.e., are drug side effects) by taking drugs and symptoms, the association may be an indirect the difference of the given drug's and condition's conditional association that also involves an underlying condition. We demonstrate in a later section that Eq. 4 can be used toseparate symptoms caused by an underlying condition from symptoms caused by a drug (i.e., drug side effects).
Table 1 shows the symptoms and drugs most strongly as- sociated with four common conditions (i.e., those symptomsand drugs with the highest conditional probabilities given Our dataset consists of a thesauri containing symptom, drug, one of the conditions). Many of the symptoms are clearly and condition terms and a Twitter corpus collected be- symptoms of the condition: migraine, headache, sneeze, tween November 2013 and November 2015. Rather than swollen, and cough are symptoms of allergies; tumors, using Twitter's streaming 1% sample API, we used Twit- polyps, and weight loss are symptoms of breast cancer, nau- ter's statuses/filter API and queried for tweets ge- sea (feeling sick), fever, and headache are flu symptoms, olocated within the United States and Canada to maximize and shoulder pain is commonly associated with strokes. The our per-user coverage. Our Twitter corpus contains approx- relationships between conditions and drugs are less clear, imately 1.5 billion tweets written by 11 million users, of with alcohol, cocaine, and testosterone commonly appear- which about 18 million (1.2%) tweets contained a term from ing. Allergies are correctly associated with allergy medi- our final thesauri (i.e., were health-related tweets). Most cations (i.e., zyrtec, prednisone, benadryl, and histamine), health-related tweets mentioned at least one symptom (57%) however, as well as a drug that alleviates an allergy symp- or condition (37%), with only 7.6% of health-related tweets toms (imitrex). Tamoxifen, one of the most common breast mentioning a drug. We use the SIDER (Kuhn et al. 2016) cancer drugs, ranks highly for the breast cancer condition.
drug database to identify drug terms in our corpus. We use Tylenol, which ranks highly for the flu, is not associated with the MedSyn thesaurus (Yates and Goharian 2013), a the- flu treatment but is used to alleviate some flu symptoms. The saurus containing both lay person and expert terminology model's difficulty identifying drugs associated with breast derived from the Unified Medical Language System (Bo- cancer, the flu, and strokes may be caused by the fact that no denreider 2004), to identify health-related terms that may drugs are commonly taken for these conditions; the flu is a express symptom or condition concepts. Additionally, we common condition with no cure. Drugs are used in the treat- manually verified every symptom, drug, and condition term ment of breast cancer and strokes, but these are relatively that occurred at least 400 times in our corpus and removed rare conditions so their associated drugs are less likely to be ambiguous terms from our thesauri.
mentioned on Twitter.
Similarities between drugs Drug, symptom, and condition associations To evaluate how well our model can be used to mea-sure the similarity between two drugs, we compute the To evaluate how well we learn associations between drugs, KL divergence (Eq. 3) between NSAIDs (nonsteroidal anti- symptoms, and conditions, we compute the conditional inflammatory drugs). NSAIDs are commonly referred to probability (Eq. 2) of symptoms given NSAIDs (nons- both by their brand names and generic names, making them teroidal anti-inflammatory drugs), which are a common ideal for evaluating our drug-similarity metric. The KL di- class of over-the-counter painkilling drugs (i.e., Pr(S D)).
vergences between pairs of NSAIDs are shown in Table 2.
The top ten symptoms for each drug are: The corresponding brand name or generic name for each Advil headache (0.11), confused (0.08), sleepy (0.07), drug is shown in parentheses in the first column. Lower num- throw up (0.06), pass out (0.04), cough, cramps, feel sick, bers indicate higher degrees of similarity. KL divergence is hangover & fever (0.03) not symmetric, so we compare drugs D1 and D2 by taking nasal polyps (0.03) shoulder pain (0.05) lose weight (0.02) hemorrhage (0.02) prednisone (0.03) testosterone (0.02) testosterone (0.04) amoxicillin (0.02) prednisone (0.02) clarithromycin (0.01) Table 1: The symptoms and drugs most strongly associated with four common conditions. Many symptoms (e.g., migraine,headache, sneeze, etc.) and drugs (e.g., zyrtec, prednisone, benadryl) are correctly associated with allergy condition. The otherconditions are correctly associated with many symptoms, but not with many drugs.
Table 2: KL divergences between nonsteroidal anti-inflammatory drugs derived from drug-symptom distributions. Lower num-bers indicate a higher similarity. Each drug's brand name and generic name was treated as a unique drug for the purpose ofevaluating the drug similarities. Our model correctly identifies Acetaminophen and Tylenol and Ibuprofen and Advil as sim-ilar drugs, but fails to identify Naproxen and Aleve as being similar. The relative results do not change when drug-conditiondistributions are instead used to compute the similarity.
the mean of the KL divergences of their drug-symptom dis- probabilities as shown in Eq. 4. The drugs and conditions tributions. This approach can also be used to measure the to compare were chosen by selecting the five most fre- similarity between distributions conditioned on two condi- quently occurring drugs with a clearly associated condition; tions or two symptoms.
some drugs such as morphine, adderall, aspirin, and be- Our model correctly indicates that Ibuprofen is the nadryl occurred more frequently, but were not strongly asso- most similar drug to Advil and that Acetaminophen is the ciated with any condition (as determined by Pr C D). Such most similar drug to Tylenol. It incorrectly indicates that drugs either belonged to more general classes of drugs that Naproxen and Aleve are more similar to Ibuprofen and Advil are often used for symptom relief (i.e., NSAIDs and anti- than the two drugs are to each other. This error may be histamines) or were drugs that are known to be commonly caused by a much lower number of tweets mentioning Aleve abused (e.g., morphine, adderall, xanax, etc.). The top five and Naproxen; these drug terms occur in our corpus approx- drugs and their associated conditions are: prednisone (aller- imately 20% and 8% as often as the next most infrequent gies), lipitor (diabetes), prozac (depression), zoloft (depres- term (Acetaminophen), respectively.
sion), and paxil (depression).
The top ten symptoms attributed to each drug and con- Condition symptoms vs. drug side effects dition are shown in Table 3. Symptoms attributed to the We evaluate how well our model can be used to distinguish drug are shown in the D rows (top half) and symptoms at- symptoms caused by a condition from symptoms caused tributed to the condition are shown in the C rows (bottom by a drug (i.e., drug side effects) by comparing conditional half). The symptoms associated with depression (D=Prozac, pulmonary emb. (.03) stomach ache (.02) suicidal th. (.06) suicidal th. (.04) gynecomastia (.03) mood swings (.02) liver damage (.02) tooth decay (.04) deep vein thromb. (.03) gain weight (.06) deep vein thromb. (.02) gain weight (.03) tooth decay (.04) pulmonary emb. (.02) irritability (.02) suicidal th. (.01) urinary incont. (.03) gain weight (.01) low testosterone (.02) asthma attack (.01) muscle soreness (.01) irritable bowel (.02) suicidal th. (.01) inflammation (.02) vein thromb. (.01) irritability (.02) lose weight (.01) lose weight (.02) lose weight (.02) inflammation (.01) lose weight (.01) Table 3: Symptoms most strongly associated with drugs (D row) and conditions (C row) for each drug and condition pair. Manydrug side effects are correctly identified (e.g., nausea with Prednisone, weight gain with Prozac and Paxil, etc.). Similarly, thereis high agreement among the conditions associated with depression with no more than two entries differing between any pair.
Note the following terms were abbreviated: embolism, high blood pressure, incontinence, suicidal thoughts, and thrombosis.
D=Zoloft, and D=Paxil) are strikingly similar, with only We find that our approach is often able to correctly iden- one entry differing between the Prozac and Paxil columns tify equivalent drugs as similar and to correctly separate a (i.e., cramps vs. feel sick) and two entries unique to the condition's symptoms from drug side effects. We envision Zoloft column (i.e., hangover and inflammation). The symp- incorporating our approach with health-related text mining toms associated with the drugs that treat depression are less systems to improve their accuracy. Systems for discover- accurate, with several terms that do not appear to be related ing expected and unexpected drug side effects could bene- at all (e.g., clubfoot, sneeze, tumor, irritable bowel, etc.).
fit from our method for differentiating between conditions' The drug symptoms caries, tooth decay, weight gain, incon- symptoms and drug side effects, for example, and our drug tinence, and diarrhea are known side effects.
similarity and condition similarity measures could be used Similarly, many of the symptoms associated with Pred- to help identify drug and condition synonyms.
nisone (i.e., swelling, mood changes, nausea, and exhaus-tion) and Lipitor (i.e., headache, migraine, weight gain, mus- cle soreness, and stomach pain) are known side effects of Bodenreider, O. 2004. The Unified Medical Language System those drugs. Many of the symptoms associated with aller- (UMLS): integrating biomedical terminology. Nucleic acids re- gies are allergy symptoms, such as sneeze, headache, cough, search 32(Database issue):D267–70.
and runny nose, whereas fewer symptoms appear to be cor- Chen, L.; Hossain, K. S. M. T.; Butler, P.; Ramakrishnan, N.; and rectly associated with diabetes (i.e., exhaustion, weight loss, Prakash, B. A. 2014. Flu gone viral: Syndromic surveillance of flu and nausea). These results illustrate that while we differenti- on twitter using temporal topic models. In IEEE ICDM'14.
ate between symptoms caused by conditions and symptoms Kuhn, M.; Letunic, I.; Jensen, L. J.; and Bork, P. 2016. The sider caused by drugs (i.e., drug side effects), identifying causal database of drugs and side effects. Nucleic Acids Res 44(Database relationships is difficult and should be handled with care.
Paul, M., and Dredze, M. 2011. You are what you tweet: Analyzing twitter for public health. In AAAI ICWSM'11.
Yates, A., and Goharian, N. 2013. ADRTrace: Detecting Expected We described a model for learning associations between and Unexpected Adverse Drug Reactions from User Reviews on mentions of drugs, symptoms, and medical conditions in Social Media Sites. In ECIR'13.
Twitter, and investigated its ability to (1) learn associations Yates, A.; Goharian, N.; and Frieder, O. 2015. Extracting Adverse between drugs, symptoms, and conditions, (2) to identify Drug Reactions from Social Media. In Proceedings of the AAAI conditions or drugs that are similar based on their asso- Conference on Artificial Intelligence (AAAI'15).
ciated symptoms, and (3) to differentiate between symp-toms caused by drugs (i.e., drug side effects) and symptomscaused by a condition that a drug is being taken to treat.

Source: http://ir.cs.georgetown.edu/downloads/icwsm16_adr.pdf

Microsoft word - k1351460 -unep-pops-cop-6-inf-4-rev-1.doc

UNEP/POPS/COP.6/INF/4/Rev.1 Distr.: General 24 April 2013 Stockholm Convention on Persistent Organic Pollutants Conference of the Parties to the Stockholm Convention on Persistent Organic Pollutants Sixth meeting Geneva, 28 April–10 May 2013 Item 5 (a) (ii) of the provisional agenda∗ Matters related to the implementation of the Convention: measures to reduce or eliminate releases from intentional production and use: exemptions

No job name

Chem. Res. Toxicol. 2006, 19, 164-172 The Greater Reactivity of Estradiol-3,4-quinone vs Estradiol-2,3-quinone with DNA in the Formation of Depurinating Adducts: Implications for Tumor-Initiating Activity Muhammad Zahid, Ekta Kohli, Muhammad Saeed, Eleanor Rogan, and Ercole Cavalieri* Eppley Institute for Research in Cancer and Allied Diseases, UniVersity of Nebraska Medical Center,