Quantcast
Channel: SAP – Cloud Data Architect
Viewing all articles
Browse latest Browse all 140

Extracting data from SAP HANA using AWS Glue and JDBC

$
0
0

Feed: AWS for SAP.
Author: Chris Williams.

Have you ever found yourself endlessly clicking through the SAP GUI searching for the data you need? Then finally resort to exporting tables to spreadsheets, just to run a simple query to get the answer you need?

I know I have—and wanted an easy way to access SAP data and put it in a place where I can use it the way I want.

In this post, you walk through setting up a connection to SAP HANA using AWS Glue and extracting data to Amazon S3. This solution enables a seamless mechanism to expose SAP to a variety of analytics and visualization services, allowing you to find the answer you need.

There are several tools available to extract data from SAP. However, almost all of them take months to implement, deploy, and license. Also, they are a “one-way door” approach—after you make a decision, it’s hard to go back to your original state.

In this post, you use the previous AWS resources plus AWS Secrets Manager to set up a connection to SAP HANA and extract data.

Before you set up connectivity, you must store your credentials, connection details, and JDBC driver in a secure place. First, create an S3 bucket for this exercise.

You should now have a brand new bucket and structure ready to use.

Next, use Secrets Manager to store your credentials and connection details securely.

The following screenshot shows your secret successfully saved.

Next, create an IAM role for your AWS Glue job. The IAM role can either be created before creating the extraction job or created during the run. For this exercise, create it in advance.

After creating the IAM role, upload the JDBC driver to the location in your S3 bucket, as shown in the following screenshot. For this example, use the SAP HANA driver, which is available on the SAP support site.

Now that you set up the prerequisites, author your AWS Glue job for SAP HANA.

Now, create the actual AWS Glue job.

import sys
import boto3
import json
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.job import Job


## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

# Getting DB credentials from Secrets Manager
client = boto3.client("secretsmanager", region_name="us-east-1")

get_secret_value_response = client.get_secret_value(
        SecretId="SAP-Connection-Info"
)

secret = get_secret_value_response['SecretString']
secret = json.loads(secret)

db_username = secret.get('db_username')
db_password = secret.get('db_password')
db_url = secret.get('db_url')
table_name = secret.get('db_table')
jdbc_driver_name = secret.get('driver_name')
s3_output = secret.get('output_bucket')

# Uncomment to troubleshoot the ingestion of Secrets Manager parameters
# By uncommenting, you may print secrets in plaintext!
#print "bucketname"
#print s3_output
#print "tablename"
#print table_name
#print "db username"
#print db_username
#print "db password"
#print db_password
#print "db url"
#print db_url
#print "jdbc driver name"
#print jdbc_driver_name

# Connecting to the source
df = glueContext.read.format("jdbc").option("driver", jdbc_driver_name).option("url", db_url).option("dbtable", table_name).option("user", db_username).option("password", db_password).load()

df.printSchema()
print df.count()

datasource0 = DynamicFrame.fromDF(df, glueContext, "datasource0")

# Defining mapping for the transformation
applymapping2 = ApplyMapping.apply(frame = datasource0, mappings = [("MANDT", "varchar","MANDT", "varchar"), ("KUNNR", "varchar","KUNNR", "varchar"), ("LAND1", "varchar","LAND1", "varchar"),("NAME1", "varchar","NAME1", "varchar"),("NAME2", "varchar","NAME2", "varchar"),("ORT01", "varchar","ORT01", "varchar"), ("PSTLZ", "varchar","PSTLZ", "varchar"), ("REGIO", "varchar","REGIO", "varchar"), ("SORTL", "varchar","SORTL", "varchar"), ("STRAS", "varchar","STRAS", "varchar"), ("TELF1", "varchar","TELF1", "varchar"), ("TELFX", "varchar","TELFX", "varchar"), ("XCPDK", "varchar","XCPDK", "varchar"), ("ADRNR", "varchar","ADRNR", "varchar"), ("MCOD1", "varchar","MCOD1", "varchar"), ("MCOD2", "varchar","MCOD2", "varchar"), ("MCOD3", "varchar","MCOD3", "varchar"), ("ANRED", "varchar","ANRED", "varchar"), ("AUFSD", "varchar","AUFSD", "varchar"), ("BAHNE", "varchar","BAHNE", "varchar"), ("BAHNS", "varchar","BAHNS", "varchar"), ("BBBNR", "varchar","BBBNR", "varchar"), ("BBSNR", "varchar","BBSNR", "varchar"), ("BEGRU", "varchar","BEGRU", "varchar"), ("BRSCH", "varchar","BRSCH", "varchar"), ("BUBKZ", "varchar","BUBKZ", "varchar"), ("DATLT", "varchar","DATLT", "varchar"), ("ERDAT", "varchar","ERDAT", "varchar"), ("ERNAM", "varchar","ERNAM", "varchar"), ("EXABL", "varchar","EXABL", "varchar"), ("FAKSD", "varchar","FAKSD", "varchar"), ("FISKN", "varchar","FISKN", "varchar"), ("KNAZK", "varchar","KNAZK", "varchar"), ("KNRZA", "varchar","KNRZA", "varchar"), ("KONZS", "varchar","KONZS", "varchar"), ("KTOKD", "varchar","KTOKD", "varchar"), ("KUKLA", "varchar","KUKLA", "varchar"), ("LIFNR", "varchar","LIFNR", "varchar"), ("LIFSD", "varchar","LIFSD", "varchar"), ("LOCCO", "varchar","LOCCO", "varchar"), ("LOEVM", "varchar","LOEVM", "varchar"), ("NAME3", "varchar","NAME3", "varchar"), ("NAME4", "varchar","NAME4", "varchar"), ("NIELS", "varchar","NIELS", "varchar"), ("ORT02", "varchar","ORT02", "varchar"), ("PFACH", "varchar","PFACH", "varchar"), ("PSTL2", "varchar","PSTL2", "varchar"), ("COUNC", "varchar","COUNC", "varchar"), ("CITYC", "varchar","CITYC", "varchar"), ("RPMKR", "varchar","RPMKR", "varchar"), ("SPERR", "varchar","SPERR", "varchar"), ("SPRAS", "varchar","SPRAS", "varchar"), ("STCD1", "varchar","STCD1", "varchar"), ("STCD2", "varchar","STCD2", "varchar"), ("STKZA", "varchar","STKZA", "varchar"), ("STKZU", "varchar","STKZU", "varchar"), ("TELBX", "varchar","TELBX", "varchar"), ("TELF2", "varchar","TELF2", "varchar"), ("TELTX", "varchar","TELTX", "varchar"), ("TELX1", "varchar","TELX1", "varchar"), ("LZONE", "varchar","LZONE", "varchar"), ("STCEG", "varchar","STCEG", "varchar"), ("GFORM", "varchar","GFORM", "varchar"), ("UMSAT", "varchar","UMSAT", "varchar"), ("UPTIM", "varchar","UPTIM", "varchar"), ("JMZAH", "varchar","JMZAH", "varchar"), ("UMSA1", "varchar","UMSA1", "varchar"), ("TXJCD", "varchar","TXJCD", "varchar"), ("DUEFL", "varchar","DUEFL", "varchar"), ("HZUOR", "varchar","HZUOR", "varchar"), ("UPDAT", "varchar","UPDAT", "varchar"), ("RGDATE", "varchar","RGDATE", "varchar"), ("RIC", "varchar","RIC", "varchar"), ("LEGALNAT", "varchar","LEGALNAT", "varchar"), ("/VSO/R_PALHGT", "varchar","/VSO/R_PALHGT", "varchar"), ("/VSO/R_I_NO_LYR", "varchar","/VSO/R_I_NO_LYR", "varchar"), ("/VSO/R_ULD_SIDE", "varchar","/VSO/R_ULD_SIDE", "varchar"), ("/VSO/R_LOAD_PREF", "varchar","/VSO/R_LOAD_PREF", "varchar"), ("AEDAT", "varchar","AEDAT", "varchar"), ("PSPNR", "varchar","PSPNR", "varchar"), ("J_3GTSDMON", "varchar","J_3GTSDMON", "varchar"), ("J_3GSTDIAG", "varchar","J_3GSTDIAG", "varchar"), ("J_3GTAGMON", "varchar","J_3GTAGMON", "varchar"), ("J_3GVMONAT", "varchar","J_3GVMONAT", "varchar"), ("J_3GLABRECH", "varchar","J_3GLABRECH", "varchar"), ("J_3GEMINBE", "varchar","J_3GEMINBE", "varchar"), ("J_3GFMGUE", "varchar","J_3GFMGUE", "varchar"), ("J_3GZUSCHUE", "varchar","J_3GZUSCHUE", "varchar")], transformation_ctx = "applymapping1")


resolvechoice3 = ResolveChoice.apply(frame = applymapping2, choice = "make_struct", transformation_ctx = "resolvechoice3")
dropnullfields3 = DropNullFields.apply(frame = resolvechoice3, transformation_ctx = "dropnullfields3")

# Writing to destination
datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": s3_output}, format = "csv", transformation_ctx = "datasink4")

job.commit()

Now that you created the AWS Glue job, the next step is to run it.

As this is the first run, you may see the Pending execution message to the right of the date and time for 5-10 minutes, as shown in the following screenshot. Behind the scenes, AWS is spinning up a Spark cluster to run your job.

The job log for a successful run looks like the following screenshot.

If you encounter any errors, they are in Amazon CloudWatch under /aws-glue/jobs/.

You got the data out of SAP into S3. Now you need a way to contextualize it, so that end users can apply their logic and automate what they usually do in spreadsheets. To do this, set up integration with your data in S3 to Athena and Amazon QuickSight.

Next, extend these queries to visualizations to further enrich the data.

With Amazon QuickSight’s drag-and-drop capability, you can now build visualizations from the fields brought over using S3 and Athena.

In this post, you walked through setting up a connection to SAP HANA using AWS Glue and extracting data to S3. This enables a seamless mechanism to expose SAP to a variety of analytics and visualization services allowing you to find the answer you need. No longer do you have to use SAP’s transaction code, SE16, to export data to a spreadsheet, only to have to upload it to another tool for manipulation.

Make sure that you review your HANA license model with SAP to make sure you are using supportable features within HANA when extracting data.


Viewing all articles
Browse latest Browse all 140

Trending Articles