[Nov-2024] Pass Google Professional-Data-Engineer Exam in First Attempt Guaranteed! [Q108-Q123]

[Nov-2024] Pass Google Professional-Data-Engineer Exam in First Attempt Guaranteed!

Full Professional-Data-Engineer Practice Test and 373 unique questions with explanations waiting just for you, get it now!

NEW QUESTION # 108
Your company's customer and order databases are often under heavy load. This makes performing analytics against them difficult without harming operations. The databases are in a MySQL cluster, with nightly backups taken using mysqldump. You want to perform analytics with minimal impact on operations. What should you do?

A. Connect an on-premises Apache Hadoop cluster to MySQL and perform ETL.
B. Add a node to the MySQL cluster and build an OLAP cube there.
C. Use an ETL tool to load the data from MySQL into Google BigQuery.
D. Mount the backups to Google Cloud SQL, and then process the data using Google Cloud Dataproc.

Answer: A

NEW QUESTION # 109
Your company has hired a new data scientist who wants to perform complicated analyses across very large datasets stored in Google Cloud Storage and in a Cassandra cluster on Google Compute Engine. The scientist primarily wants to create labelled data sets for machine learning projects, along with some visualization tasks.
She reports that her laptop is not powerful enough to perform her tasks and it is slowing her down. You want to help her perform her tasks. What should you do?

A. Host a visualization tool on a VM on Google Compute Engine.
B. Run a local version of Jupiter on the laptop.
C. Grant the user access to Google Cloud Shell.
D. Deploy Google Cloud Datalab to a virtual machine (VM) on Google Compute Engine.

Answer: C

Explanation:
Explanation/Reference:

NEW QUESTION # 110
You have spent a few days loading data from comma-separated values (CSV) files into the Google BigQuery table CLICK_STREAM. The column DTstores the epoch time of click events. For convenience, you chose a simple schema where every field is treated as the STRINGtype. Now, you want to compute web session durations of users who visit your site, and you want to change its data type to the TIMESTAMP. You want to minimize the migration effort without making future queries computationally expensive. What should you do?

A. Create a view CLICK_STREAM_V, where strings from the column DTare cast into TIMESTAMPvalues.
Reference the view CLICK_STREAM_Vinstead of the table CLICK_STREAMfrom now on.
B. Add a column TSof the TIMESTAMPtype to the table CLICK_STREAM, and populate the numeric values from the column TSfor each row. Reference the column TSinstead of the column DTfrom now on.
C. Construct a query to return every row of the table CLICK_STREAM, while using the built-in function to cast strings from the column DTinto TIMESTAMPvalues. Run the query into a destination table NEW_CLICK_STREAM, in which the column TSis the TIMESTAMPtype. Reference the table NEW_CLICK_STREAMinstead of the table CLICK_STREAMfrom now on. In the future, new data is loaded into the table NEW_CLICK_STREAM.
D. Add two columns to the table CLICK STREAM: TSof the TIMESTAMPtype and IS_NEWof the BOOLEANtype. Reload all data in append mode. For each appended row, set the value of IS_NEWto true. For future queries, reference the column TSinstead of the column DT, with the WHEREclause ensuring that the value of IS_NEWmust be true.
E. Delete the table CLICK_STREAM, and then re-create it such that the column DTis of the TIMESTAMP type. Reload the data.

Answer: D

NEW QUESTION # 111
An aerospace company uses a proprietary data format to store its night data. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming as few resources as possible. What should you do?

A. Use a standard Dataflow pipeline to store the raw data m BigQuery and then transform the format later when the data is used
B. Write a she script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source
C. Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format
D. Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format

Answer: C

NEW QUESTION # 112
Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set.
You want to increase the AUC of the model. What should you do?

A. Train a classifier with deep neural networks, because neural networks would always beat SVMs
B. Perform hyperparameter tuning
C. Deploy the model and measure the real-world AUC; it's always higher because of generalization
D. Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC

Answer: B

Explanation:
https://towardsdatascience.com/understanding-hyperparameters-and-its-optimisation-techniques-f0debba07568

NEW QUESTION # 113
Which of these is not a supported method of putting data into a partitioned table?

A. If you have existing data in a separate file for each day, then create a partitioned table and upload each file into the appropriate partition.
B. Run a query to get the records for a specific day from an existing table and for the destination table, specify a partitioned table ending with the day in the format "$YYYYMMDD".
C. Use ORDER BY to put a table's rows into chronological order and then change the table's type to
"Partitioned".
D. Create a partitioned table and stream new records to it every day.

Answer: C

Explanation:
You cannot change an existing table into a partitioned table. You must create a partitioned table from scratch.
Then you can either stream data into it every day and the data will automatically be put in the right partition, or you can load data into a specific partition by using "$YYYYMMDD" at the end of the table name.
Reference: https://cloud.google.com/bigquery/docs/partitioned-tables

NEW QUESTION # 114
Which of the following statements is NOT true regarding Bigtable access roles?

A. To give a user access to only one table in a project, you must configure access through your application.
B. Using IAM roles, you cannot give a user access to only one table in a project, rather than all tables in a project.
C. To give a user access to only one table in a project, grant the user the Bigtable Editor role for that table.
D. You can configure access control only at the project level.

Answer: C

Explanation:
For Cloud Bigtable, you can configure access control at the project level. For example, you can grant the ability to:
Read from, but not write to, any table within the project.
Read from and write to any table within the project, but not manage instances.
Read from and write to any table within the project, and manage instances.
Reference: https://cloud.google.com/bigtable/docs/access-control

NEW QUESTION # 115
Which of the following statements about Legacy SQL and Standard SQL is not true?

A. Standard SQL is the preferred query language for BigQuery.
B. If you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL.
C. You need to set a query language for each dataset and the default is Standard SQL.
D. One difference between the two query languages is how you specify fully-qualified table names (i.e. table names that include their associated project name).

Answer: C

Explanation:
You do not set a query language for each dataset. It is set each time you run a query and the default query language is Legacy SQL.
Standard SQL has been the preferred query language since BigQuery 2.0 was released.
In legacy SQL, to query a table with a project-qualified name, you use a colon, :, as a separator. In standard SQL, you use a period, ., instead.
Due to the differences in syntax between the two query languages (such as with project- qualified table names), if you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL.
Reference:
https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql

NEW QUESTION # 116
You work for a large ecommerce company. You are using Pub/Sub to ingest the clickstream data to Google Cloud for analytics. You observe that when a new subscriber connects to an existing topic to analyze data, they are unable to subscribe to older data for an upcoming yearly sale event in two months, you need a solution that, once implemented, will enable any new subscriber to read the last 30 days of dat a. What should you do?

A. Ask the source system to re-push the data to Pub/Sub, and subscribe to it.
B. Create a new topic, and publish the last 30 days of data each time a new subscriber connects to an existing topic.
C. Set the topic retention policy to 30 days.
D. Set the subscriber retention policy to 30 days.

Answer: C

Explanation:
By setting the topic retention policy to 30 days, you can ensure that any new subscriber can access the messages that were published to the topic within the last 30 days1. This feature allows you to replay previously acknowledged messages or initialize new subscribers with historical data2. You can configure the topic retention policy by using the Cloud Console, the gcloud command-line tool, or the Pub/Sub API1.
Option A is not efficient, as it requires creating a new topic and duplicating the data for each new subscriber, which would increase the storage costs and complexity. Option C is not effective, as it only affects the unacknowledged messages in a subscription, and does not allow new subscribers to access older data3. Option D is not feasible, as it depends on the source system's ability and willingness to re-push the data, and it may cause data duplication or inconsistency. Reference:
1: Create a topic | Cloud Pub/Sub Documentation | Google Cloud
2: Replay and purge messages with seek | Cloud Pub/Sub Documentation | Google Cloud
3: When is a PubSub Subscription considered to be inactive?

NEW QUESTION # 117
You are building a new data pipeline to share data between two different types of applications: jobs generators and job runners. Your solution must scale to accommodate increases in usage and must accommodate the addition of new applications without negatively affecting the performance of existing ones. What should you do?

A. Use a Cloud Pub/Sub topic to publish jobs, and use subscriptions to execute them
B. Create an API using App Engine to receive and send messages to the applications
C. Create a table on Cloud Spanner, and insert and delete rows with the job information
D. Create a table on Cloud SQL, and insert and delete rows with the job information

Answer: A

Explanation:
Pubsub is used to transmit data in real time and scale automatically.

NEW QUESTION # 118
Which of these sources can you not load data into BigQuery from?

A. Google Cloud SQL
B. Google Drive
C. Google Cloud Storage
D. File upload

Answer: A

Explanation:
You can load data into BigQuery from a file upload, Google Cloud Storage, Google Drive, or Google Cloud Bigtable. It is not possible to load data into BigQuery directly from Google Cloud SQL. One way to get data from Cloud SQL to BigQuery would be to export data from Cloud SQL to Cloud Storage and then load it from there.
Reference: https://cloud.google.com/bigquery/loading-data

NEW QUESTION # 119
You have a BigQuery table that contains customer data, including sensitive information such as names and addresses. You need to share the customer data with your data analytics and consumer support teams securely.
The data analytics team needs to access the data of all the customers, but must not be able to access the sensitive data. The consumer support team needs access to all data columns, but must not be able to access customers that no longer have active contracts. You enforced these requirements by using an authorized dataset and policy tags After implementing these steps, the data analytics team reports that they still have access to the sensitive columns. You need to ensure that the data analytics team does not have access to restricted data What should you do?
Choose 2 answers

A. Replace the authorized dataset with an authorized view Use row-level security and apply filter_ expression to limit data access.
B. Remove the bigquery. dataViewer role from the data analytics team on the authorized datasets.
C. Ensure that the data analytics team members do not have the Data Catalog Fine-Grained Reader role for the policy tags.
D. Enforce access control in the policy tag taxonomy.
E. Create two separate authorized datasets; one for the data analytics team and another for the consumer support team.

Answer: C,D

Explanation:
To ensure that the data analytics team does not have access to sensitive columns, you should:
* B. Ensure that the data analytics team members do not have the Data Catalog Fine-Grained Reader role for the policy tags. This role allows users to read metadata for data assets that have policy tags applied, which could include sensitive information.
* C. Enforce access control in the policy tag taxonomy. By setting access control at the policy tag level, you can restrict access to specific columns within a dataset, ensuring that only authorized users can view sensitive data.

NEW QUESTION # 120
The marketing team at your organization provides regular updates of a segment of your customer dataset.
The marketing team has given you a CSV with 1 million records that must be updated in BigQuery. When you use the UPDATE statement in BigQuery, you receive a quotaExceeded error. What should you do?

A. Split the source CSV file into smaller CSV files in Cloud Storage to reduce the number of BigQuery UPDATE DML statements per BigQuery job.
B. Import the new records from the CSV file into a new BigQuery table. Create a BigQuery job that merges the new records with the existing records and writes the results to a new BigQuery table.
C. Increase the BigQuery UPDATE DML statement limit in the Quota management section of the Google Cloud Platform Console.
D. Reduce the number of records updated each day to stay within the BigQuery UPDATE DML statement limit.

Answer: B

Explanation:
https://cloud.google.com/blog/products/gcp/performing-large-scale-mutations-in-bigquery

NEW QUESTION # 121
You have an Apache Kafka Cluster on-prem with topics containing web application logs. You need to replicate the data to Google Cloud for analysis in BigQuery and Cloud Storage. The preferred replication method is mirroring to avoid deployment of Kafka Connect plugins.
What should you do?

A. Deploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Source connector. Use a Dataflow job to read fron PubSub and write to GCS.
B. Deploy a Kafka cluster on GCE VM Instances with the PubSub Kafka connector configured as a Sink connector. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS.
C. Deploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Sink connector. Use a Dataflow job to read fron PubSub and write to GCS.
D. Deploy a Kafka cluster on GCE VM Instances. Configure your on-prem cluster to mirror your topics to the cluster running in GCE. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS.

Answer: D

NEW QUESTION # 122
You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps.
You have the following requirements:
* You will batch-load the posts once per day and run them through the Cloud Natural Language API.
* You will extract topics and sentiment from the posts.
* You must store the raw posts for archiving and reprocessing.
* You will create dashboards to be shared with people both inside and outside your organization.
You need to store both the data extracted from the API to perform analysis as well as the raw social media posts for historical archiving. What should you do?

A. Store the social media posts and the data extracted from the API in Cloud SQL.
B. Feed to social media posts into the API directly from the source, and write the extracted data from the API into BigQuery.
C. Store the raw social media posts in Cloud Storage, and write the data extracted from the API into BigQuery.
D. Store the social media posts and the data extracted from the API in BigQuery.

Answer: B

Explanation:
Explanation

NEW QUESTION # 123
......

Get Latest Professional-Data-Engineer Dumps Exam Questions in here: https://passleader.testpassking.com/Professional-Data-Engineer-exam-testking-pass.html

[Nov-2024] Pass Google Professional-Data-Engineer Exam in First Attempt Guaranteed! [Q108-Q123]

Related Articles