Simon Fraser University (SFU) - CZSaw
Dustin Dunsmuir, Simon Fraser University, email@example.com
Saba Alimadadi, Simon Fraser University, firstname.lastname@example.org
Victor Chen, Simon Fraser University, email@example.com
Eric Lee, Simon Fraser University, firstname.lastname@example.org
Cheryl Qian, Purdue University, email@example.com
John Dill, Simon Fraser University, firstname.lastname@example.org
Chris Shaw, Simon Fraser University, email@example.com
Robert Woodbury, Simon Fraser University, firstname.lastname@example.org
a visual analytics tool for sense-making across text documents with extracted
entities that focuses on the analysis process. It uses a variety of flexible
data visualizations for different perspectives of the networks of people,
places, dates, etc. It records the analysis process and model, and visualizes
them in a history view and dependency graph respectively. The history view
provides quick access to past states in the analysis process. The dependency
graph allows quick rerunning of parts of the analysis process on new data. In
order to make this possible all semantically meaningful interactions are
captured in a script language which can be edited by an expert user for fine
control of their analysis process.
CZSaw was developed in the School of Interactive Arts and Technology at Simon Fraser University by Victor Chen, Dustin Dunsmuir, Eric Lee, Nazanin Kadivar, Cheryl Qian, John Dill, Chris D. Shaw, and Rob Woodbury. A paper entitled “Capturing and Supporting the Analysis Process” was presented at VAST 2009, and the presentation materials can be found on the vgtc website. For more information on CZSaw, see CZSaw's webpage.
Maxim Roy of the Natural Language Lab at SFU, applied the entity extraction to the XML file that was then used by CZSaw. He used Alias-i’s Lingpipe system for named entity extraction. For numerical entities he trained a new named-entity model from the MUC-7 news data corpus. MUC7. 1996. Message Understanding Conference (MUC) 7. LDC Catalog Id=LDC2001T02.
MC1.1: Summarize the activities that happened in each country with respect to illegal arms deals based on a synthesis of the information from the different report types and sources. State the situation in each country at the end of the period (i.e. the end of the information you have been given) with respect to illegal arms deals being pursued. Present a hypothesis about the next activities you expect to take place, with respect to the people, groups, and countries.
Because this mini challenge’s components were related, parts of the data preparation are described in MC1.2. Results described here were informed by MC1.2’s process. First we transformed the Word files into one XML file using a custom program. Maxim Roy of SFU’s Natural Language Lab ran software to extract entities (people, places, etc). The resulting file was then ready to be loaded into CZSaw, after refining as described in MC1.2.
We used CZSaw’s semantic zoom view (SZV) to examine documents at several levels of detail: overview, entities in the document, and detailed text. We used a clustering algorithm which places document with many entities in common close to each other as seen in Figure 1.
Figure 1- SZV with all 103 documents laid out. Colored highlighting is applied based on date ranges in the side bar.
This layout led us to the most tightly clustered documents: telephone calls between the same few people in Turkey and Syria. We viewed the documents by semantically zooming into them (Figure 2), which distorts the overall layout like a fisheye, and then back out which restores the original layout.
Figure 2- SZV view with a couple documents zoomed in part way.
We grouped related documents using the manual group function. We similarly grouped the other documents clustered around the same people and countries (Figure 3). Scanning each document’s text and entities, and grouping the results enabled us to create mutually exclusive groups containing all 103 documents, in approximately two hours. Brushing and searching aided this categorization. Brushing entities in a document or group caused all documents containing the same entity to be highlighted (Figure 3). The search feature enabled searching within the text of all documents.
Figure 3- The SZV with all documents grouped showing the different tabs of a group and brushing done on the Nahid entity.
While creating groups from clusters, it became apparent almost all documents were related to arms dealing in one of the countries; thus we had to read them all. Four team members worked in parallel to summarize each group. SZVgroups have tabs representing different views of the contained documents. Team members used the text tab to read each contained document and the entity tab to see the combined set of entities and perform brushing.
Based on a series of emails, we are highly confident George Ngoki of Nigeria developed a fake government contract to arrange to buy weapons. Ngoki and many others planned to travel to Dubai as outlined in Figure 4.
A set of phone conversations revealed to us that someone named Baltasar in Syria is working together with Celik and Hakan in Turkey to purchase “textbooks” for a school while at the same time Celik is purchasing some “farming equipment”. Combining these two plans we can say with moderate confidence that this group is really purchasing weapons.
Muhammad Kasem, leader of the Martyrs Front of Judea, is in a conflict in the Gaza/West Bank area with Israel. From a variety of sources, we can say with high confidence that they purchased weapons from an outside country.
In Pakistan, the Lashkar-e-Jhangvi terrorist group uses weapons such as explosives. The leadership of this group almost certainly contains Azeem Bhutani and Maulana Haq Bukhari. A number of bank transactions from an account believed to be owned by Bukhari, suspicious packages showing up at Bukhari’s door and travel plans made for Dubai, make it likely that the Lashkar-e-Jhangvi are buying arms.
In South America, phone conversations and message board posts give us high confidence that a group of people in Carabobo and Barcelona, Venezuela, are planning a weapons purchase. They are using Jhon based out of Medilin, Colombia to organize this purchase. Bank transactions in Nov 2008 support this hypothesis.
Arms Dealers: Intermediates
From newspapers and blogs we have moderate confidence that the Ministry of Police (MP) in Kenya is involved in the shipment of weapons from their own stores to Sudan forces. Weapons are transported to the MP from Ukraine and one such shipment was hijacked by pirates from Oct 2008 to March 2009. Due to their arrest and a phone call, Thabiti Otieno and his wife Nahid Owiti are likely involved in organizing the shipments by boat and then to Sudan. They arranged to be in Dubai in April 2009 then died in Kenya on May 1st.
It is reported that Saleh Ahmed is an arms dealer in Yemen and Saudi Arabia. Based on his phone calls we confirm this with high confidence and hypothesize that he is obtaining weapons from outside of the country. We have high confidence he planned to be in Dubai in April 2009 to purchase weapons and then on May 3rd he died in a hospital in Yemen.
Arms Dealers: Source
Based on the many meetings set for Dubai for the week of April 18th, we are almost certain that the central arms dealers supplying everyone are located in Russia, Ukraine and Thailand. Based on their phone calls, we have high confidence that suspected arms dealers Nicolai Kuryakin and Boonmee Khemkhaeng are selling the weapons. Boonmee is based out of Thailand and almost certainly acted as a middle man setting up meetings with Nicolai in April 2009 for weapons purchases. Mikhail Dombrovski is based in Moscow and is also very likely involved in the selling of weapons to everyone else. Arkadi Borodinski of Ukraine, an associate of Nicolai, is likely to have attempted transporting illegal weapons to Iran. He likely arranged for Sattari Khurshid of Iran to meet Nicolai. Leonid Minsky of Ukraine was involved in illegal arms dealing until his death in February 2009. Figure 4 shows the meetings planned for Dubai in April 2009 as shown in CZNotes, our underdevelopment note taking facility.
Figure 4- Meetings and events related to Dubai involving arms dealers.
After examining the groups of documents, the existence of several threads became apparent. We next investigated the social network connecting these threads (Figure 5).
Figure 5- Social network of main people involved in arms dealing.
MC1.2: Illustrate the associations among the players in the arms dealing through a social network. If there are linkages among countries, please highlight these as well in the social network. Our analysts are interested in seeing different views of the social network that might help them in counterintelligence activities (people, places, activities, communication patterns that are key to the network).
Our analysis process is a sense making loop in which we extract and visualize entities, discover linkages, and generate high level hypotheses. The dependency graph speeds up this loop by automatically synchronizing data and views, and propagating changes (e.g. assigning a new value to a variable) to the whole structure.
Entity refinement and data cleaning
Automatic entity extraction is never 100% and CZSaw’s capabilities to interactively aid entity extraction during the analysis process were used extensively (e.g. merging misspelling of names or linking phone numbers to individuals.)
We began by clustering and categorizing documents based on content, refining and cleaning as we went. We say two documents are related if they have common entities. Based on this, the graph view (Figure 6) clusters groups of documents.
Figure 6- Document network after initial automatic entity extraction.
The document view enables reading individual documents. During the reading and other parts of the analysis, we did entity merging and other “cleaning”. CZSaw’s dependency graph and propagation system automatically updates visualizations (Figure 7).
Figure 7- Document network after entity refinement. Documents are clustered, leading us to examine each cluster.
A Different View of the Social Network
Co-citation of people in a document shows their connection. People like Dombrovski and Nicolai (Russia), Ahmed (Yemen), Borodinski (Ukraine), and Otieno/Owiti (Kenya) play important roles in the network by connecting different groups of people (Figure 8).
Figure 8- A different view of the social network.
Several clusters need study, along with possible connections between them. For the sake of more readability and clarity in the graphs, we only check connections of certain entity type(s) each time. We start with ‘country’ and ‘organization’ entities (Figure 9). The important countries are: Russia, Ukraine, North Korea, Nigeria, Kenya, Venezuela, Columbia, Yemen, Saudi Arabia, UAE, Israel, Lebanon, Pakistan, and Turkey.
Figure 9- Cluster of people connected by countries and organizations.
Our next step is to check if any arm dealing happens between clusters. There are some terms used in documents, especially in the conversations, such as textbooks, pliers, etc. We hypothesize that these are code names for illegal arms, and will refer to them as “equipment”. Figure 10 shows the equipment that is related to at least two people.
Figure 10- People connected with equipment and money entities.
From Figure 10, some equipment is related to one group and some connect separate clusters. For example, “farm”, connects the Turkey/Syria group to the center farm group; and Celik acts as the connecting node. We can also see two weapon shipments, the M/V Tanya, and the IL76 cargo plane that connect a few clusters of people. Further investigation on Tanya shows that it is an illegal arms shipment to Sudan, with Kenya acting as a middle country.
Next we show money and account entities in the graph view, and track the flow of money in different bank accounts. The final destination of the money wiring is an account that we hypothesize belongs to Dombrovski based on other documents on relations between him and South America.
From reading some of the documents, we found out that there will be meetings in Dubai, during a week starting on April 15th, 2009. See Figure 4 in MC1.1 for the dates and the people involved in these events. Only one of the related reports (USGovIntel-25) mentioned a travel to Dubai on the 18th of April 2008, but according to the date that the document was written and the coupling between the contents of this set of documents, we assume that this is a mistake and the actual flight happens in April 2009.
An IL-76 air cargo that was carrying illegal weapons from North Korea by arrangements of Borodinski (Ukraine) had stopped at UAE, planning to go to Sri Lanka. The plane was scheduled to arrive at Iran on Feb 12th 2008, but was seized in Thailand on Feb 11th.
There will be a set of meetings in Dubai, from April 15th, 2009 to April 22nd, 2009, between known or suspected arm dealers and their customers. Some of the meetings will be held at the Burj Al Arab hotel. The most important meetings and related people and events are summarized in Figure 5 of MC1.1. Here, we briefly describe some of the important people and their connections, based on both document text and our inferences. We assume that there’s a group of main illegal arms suppliers, including people from Russia, Ukraine and Thailand that have arranged the meetings in this time period in Dubai.
George Ngoki (Nigeria) is involved in a deal with Mikhail Dombrovski (Russia) for purchasing arms with a value of $30.6M.
Thabiti Otieno and his wife Nahid Owiti (Kenya) transport firearms to Sudan through Kenya. These arms include the cargo ship Tanya from Nicolai Kuryakin, a known arms dealer.
Arms dealers Boonmee Khemkhaeng (Thailand), Nicolai Kuryakin (Russia) and Arkadi Borodinski (Ukraine) will meet.
Muhammad Kasem (head of MFJ) is buying arms for their “May operation” from a Russian source that Abdllah Khouri has found.
Baltasar and his friends (Turkey and Syria) want to buy “textbooks / farm equipment” from Russia through a Bosnian salesman.
Azeem Bhutani and Maulana Haq Bukhari from Lashkar-e-Jhangvi (Pakistan) are transferring money to an account in Moscow. They are flying to Dubai during this time period and the money is probably for the payment of an illegal arms deal.
Saleh Ahmed (a Yemeni arm dealer who supplies weapons to neighbouring countries of Saudi Arabia) is going to meet Mikhail Dombrovski and Nicolai Kuryakin, two major arm dealers from Russia.
Nicolai Kuryakin (Russia), Arkadi Borodinski (Ukraine) and Sattari Khurshid (Iran) will meet. The first two are known arm dealers and the third person has a history of working with Borodinski since the plane event.
Vwhombre also wants to buy “car parts” from Joe Tomski (Russia) through Jhon (Colombia) and there’s a money transfer to an account in Moscow. Dombrovski uses the email address Joetomsk@au.ru. We believe “jtomski” in this car parts deal is Dombrovski.