Sunday, January 05, 2025

Transformers, tokenization and embedding

 

Transformers:

  • Transformers are a type of deep learning model architecture introduced in the paper "Attention Is All You Need" (Vaswani et al., 2017). Transformers are a type of neural network architecture that transforms or changes an input sequence into an output sequence. They do this by learning context and tracking relationships between sequence components.
  • They are designed to handle sequential data, such as text, but unlike previous models (like RNNs or LSTMs), transformers don’t process data sequentially. Instead, they use a mechanism called self-attention to weigh the importance of different words in a sentence regardless of their position.
  • The self-attention mechanism allows each token (word or subword) in the input sequence to attend to every other token in the sequence, which helps the model understand context more effectively.
  • Auto Regressive Generation
  • Feed forward and Self-Attention: In a Transformer model, "self-attention" is a mechanism that allows the model to focus on different parts of an input sequence by calculating relationships between elements within that sequence, while "feedforward" is a fully connected neural network layer that further processes the output from the self-attention layer, adding non-linearity and enabling the model to learn more complex patterns within the data; essentially, self-attention provides context-aware representations, and feedforward refines those representations by applying non-linear transformations

Before Transformers

  • Early deep learning models that focused extensively on natural language processing (NLP) tasks aimed at getting computers to understand and respond to natural human language. They guessed the next word in a sequence based on the previous word.
  • To understand better, consider the autocomplete feature in your smartphone. It makes suggestions based on the frequency of word pairs that you type. For example, if you frequently type "I am fine," your phone autosuggests fine after you type am.
  • Early machine learning (ML) models applied similar technology on a broader scale. They mapped the relationship frequency between different word pairs or word groups in their training data set and tried to guess the next word. However, early technology couldn’t retain context beyond a certain input length. For example, an early ML model couldn’t generate a meaningful paragraph because it couldn’t retain context between the first and last sentence in a paragraph. 

Tokenization:

  • Tokenization is the process of breaking down text into smaller chunks, called tokens. These tokens can be words, subwords, or characters, depending on the granularity chosen.
  • In modern NLP models like GPT or BERT, subword tokenization (e.g., using methods like Byte Pair Encoding (BPE) or SentencePiece) is commonly used because it balances between the word-level and character-level granularity, capturing a wide range of linguistic patterns.
  • For example, the word "unhappiness" might be tokenized into ["un", "happiness"] or even further into smaller subword units like ["un", "happi", "ness"].
  • Tokenization is crucial for transforming human-readable text into a format that a machine learning model can process.
  • token is the unit of operation for the llm and
  • monetization of llm happens on tokens 

Embedding:

  • Embeddings are low-dimensional vector representations of tokens that capture semantic relationships between them.
  • In traditional NLP, each word might be represented by a unique one-hot vector, but embeddings allow words with similar meanings to have similar vector representations. For example, "king" and "queen" would be close in the embedding space.
  • The transformer model typically uses positional embeddings in addition to token embeddings. Since transformers don’t inherently process data sequentially, positional embeddings provide information about the order of tokens in the input sequence.
  • The embeddings are learned during the training process, and they evolve to capture semantic, syntactic, and contextual relationships between tokens.

How They Work Together:

  • When text is fed into a transformer model, it first undergoes tokenization (breaking the text into tokens).
  • These tokens are then mapped to their respective embeddings (vectors).
  • The transformer model processes these embeddings using the self-attention mechanism to capture the relationships and contextual meaning of tokens in the sequence.
  • The output embeddings can be used for tasks like text generation, classification, or translation.
  • the output of a llm is a probability distribution

Machine Learning, AI Definitions

Artificial intelligence

Artificial intelligence (AI) is a broad concept that encompasses machine learning (ML) and other subfields, while ML is a specific application of AI that focuses on teaching machines to perform tasks: 

  • AI
    The ability of a computer system to mimic human cognitive functions, such as learning and problem-solving. AI uses math and logic to simulate human reasoning. Examples of AI include smart assistants like Alexa, robotic vacuum cleaners, and self-driving cars. 

  • ML
    A subfield of AI that uses mathematical models and algorithms to teach computers to perform tasks without direct instruction. ML systems use patterns and inference to learn from data and improve themselves over time. Examples of ML include sorting images, analyzing big data, and forecasting sales. 

ML enhances AI's ability to perform tasks more accurately and efficiently. For example, AI uses ML to perform tasks like speech recognition and object detection.

Machine learning

Machine learning involves using algorithms to train models on data, allowing the model to identify patterns and make predictions. Traditional programming involves manually writing code to create a solution to a specific problem.




Classical programming versus machine learning paradigm. (A) In classical programming, a computer is supplied with a dataset and an algorithm. The algorithm informs the computer how to operate upon the dataset to create outputs. (B) In machine learning, a computer is supplied with a dataset and associated outputs. The computer learns and generates an algorithm that describes the relationship between the two. This algorithm can be used for inference on future datasets.

Wednesday, December 27, 2023

AWS MLOps

most supervised learning will revolve around a key few algorithms.

logistic regression for classification

linear regression for regression. 


Supervised

- input data is labeled

- data is classified based on training dataset

- used for prediction

- known number of classes

Ex: feed apples, fishes, pears to the algo, in labeled columns and tell the algo which one is which...then feed input to classify


Unsupervised

- input data is unlabeled

- assigns properties of given data to classify it

- used for analytics

- unknown number of classes

Ex: feed apples, fishes, pears to the algo, and algo will cluster 3 groups; Google photos grouping by face




Saturday, January 15, 2022

RIT - Cyber Security Fundamentals - Notes

Unit 1

Cyber security involves times when data or information is in transit, being processed, and at rest.

Some like to think of cyber security as a subset of information security, a very general term which also deals with information stored physically, in addition to cyber security's pure digital form.

We often fear the unknown hackers from the outside, but insiders are a much greater threat, and can do far greater damage.

A black hat hacker is that cracker, or malicious hacker. 

A white hat hacker does what a black hat hacker does, breaking into companies and systems, with their permission, of course, in hopes of finding and exploiting vulnerabilities.

So the company can fix those vulnerabilities before a black hat hacker can get in.

A grey hat hacker is somewhere in the middle. One type of grey hat hacker might break into a system and prove it to the administrator, then the grey hat will request payment to fix it, and if denied,

Which type of hacking does not involve any technology? - Social engineering


Unit 2

A threat is a looming danger that can change or damage your assets.

Threat agents or actors are the ones carrying out the threats. - Ex: Hackers; When threat actors carry out the threat, they exploit the vulnerability.

A vulnerability is a weakness, a flaw in a program, device, network, and even a person.


Hashing

Hashing algorithms have a few characteristics. Variable length input, fixed let output.

You could feed the Declaration of Independence into a hashing algorithm or just your name. In each case you'll wind up with the same sized output hash. Also called a message digest. Hashes are called one-way functions

Monday, December 28, 2020

MuleSoft Certified Platform Architect Exam (MCPA) notes

 

  1. RAML must be updated when the API implementation changes the structure of the request or response messages.
  2. When you deploy your application in particular region in MuleSoft Cloudhub, Workers are randomly distributed across available AZs within that region
  3. What is most likely NOT a characteristic of an integration test for a REST API implementation? The test prepares a known request payload and validates the response payload
  4. When could the API data model of a System API reasonably mimic the data model exposed by the corresponding backend system, with minimal improvements over the backend system's data model - When there is an existing Enterprise Data Model widely used across the organization.


Anypoint Platform
  1. Anypoint Platform control plane
    1. MuleSoft hosted: US East (N.VA) or  EU (Frankfurt)
    2. Customer hosted: Private Cloud Edition
  2. Anypoint Platform runtime plane
    1. CloudHub ( US East, US West, Canada, Asia Pacific, EU (Frankfurt, Ireland, London, …), South America); EU (Frankfurt and Ireland)
    2. Customer-hosted: 
      1. Manually provisioned - bare metal, VMs, on-premises, in a public or private cloud
      2. iPaaS-provisioned - 
        1. Mulesoft provided s/w appliance - Anypoint Runtime Fabric, 
        2. customer- managed - Anypoint Platform for Pivotal Cloud Foundry

  1. Anypoint Runtime Fabric is a software appliance that provides customer-hosted iPaaS functionality comparable to CloudHub. It leverages a Kubernetes cluster to do so, and executes Mule applications on Mule runtimes within Docker containers
  2. Connection to control plane is via AMQP/TLS initiated by Anypoint Runtime Fabric
  3. By default, Anypoint Platform does not act as an Identity Provider for OAuth 2.0 Client Management; Anypoint Platform supports the configuration of multiple Identity Providers for Client Management
Publishing APIs
  1. Anypoint Platform provides two main features for making the documentation for an API engaging: API Notebooks and API  Consoles
  2. Every change to the content of that API specification triggers an asset version increase in the corresponding Anypoint Exchange entry. This behavior is consistent with the fact that Anypoint Exchange is also a Maven-compatible artifact repository - storing, in this case, a RAML definition
  3. The API Notebook for an API makes use of the API specification for that API and provides an interactive JavaScript-based coding environment that can be used to document interactions with the API from the point of view of an API client
  4. Anypoint API Community Manager allows an organization to build and operate API consumer communities around their APIs, addressing both internal and external developers (partners). Anypoint API Community Manager provides customization, branding, marketing, and engagement capabilities
API Management - Policies

  1. An API policy consists of a template (code) and definition (data) and its enforcement requires a Mule runtime, which provides this function. Enforcement can either be performed embedded in an API implementation itself (if it is a Mule application), or via a separate API proxy (a dedicated Mule application), or through Anypoint Service Mesh (via a dedicated Mule application deployed to the same Kubernetes cluster as the API implementations).
  2. API policies are downloaded at runtime from Anypoint API Manager into the Mule application that enforces them
  3. External API implementations deployed to a Kubernetes cluster can use Anypoint Service Mesh as an add-on to the Kubernetes cluster
  4. API clients continue sending API invocations to the Kubernetes service representing a given API implementation: Istio/Envoy transparently intercepts all API invocations to/from an API implementation and routes them to the Mule runtime/Mule application performing API policy enforcement
  5. API proxies are well-suited for coarse-grained APIs, where the addition of a separate node - the API proxy - into the HTTP request-response path between API client and API implementation does not constitute significant overhead
  6. An API instance is an entry in Anypoint API Manager that represents a concrete API endpoint for a specific major version of an API in a specific environment 
  7. Automated policies - apply to all API instances in a given environment
  8. API Groups - that bundle API instances and streamline some API management tasks common to the group; Administrators can apply SLA tiers to the entire group of API instances;  API policies cannot be applied to API Group instances
  9. API Group instances bundle several API instances for the main benefit of enabling API consumers to request access for a particular API client to the group and thereby gain access to each API instance in that group
  10. The API policy caches entire HTTP responses, incl. the HTTP response status code, HTTP response headers and the HTTP response body. There is a size limit for cached HTTP responses (1MB)
  11. Spike Control API policy must protect the backend system from temporary API invocation bursts by evening them out and enforces the published overall throughput guarantee - X-RateLimit- HTTP response headers should not be exposed to API clients from this API policy but from the SLA-based API policy
  12. In the case of security-related API policies, RAML has specific support through securitySchemes, e.g. of type OAuth 2.0 or Basic Authentication. In other cases, RAML traits are a perfect mechanism for expressing the changes to the API  specification introduced by the application of an API policy
  13. Edge policies supported by Anypoint Security 
    1. Content Attack Prevention (CAP) 
    2. Allowlisting of API client IP addresses 
    3. Web Application Firewall (WAF) 
    4. DoS attack prevention 
  14. Tokenization replaces sensitive information (credit card number, SSN, account number, any regex, …) with a reversible token

Enterprise, Bounded Context Data Models

  1. In an Enterprise Data Model - often called Canonical Data Model, but the discussion here uses the term Enterprise Data Model throughout - there is exactly one canonical definition of each data type, which is reused in all APIs that require that data type, within an organization
  2. In a Bounded Context Data Model several Bounded Contexts are identified within an organization by their usage of common terminology and concepts. Each Bounded Context then has its own, distinct set of data type definitions - the Bounded Context Data Model. The Bounded Context Data Models of separate Bounded Contexts are formally unrelated, although they may share some names. All APIs in a Bounded Context reuse the Bounded Context Data Model of that Bounded Context
  3. If there is no successful Enterprise Data Model, it is most pragmatic to use Bounded Context Data Models
  4. If there is a successful Enterprise Data Model, then all Process APIs and System APIs should reuse that Enterprise Data Model as much as possible
  5. The API data model of Experience APIs, on the other hand, is determined by the needs of the top-level API clients (such as user-visible apps) and thus is very unlikely to be served by an Enterprise Data Model
  6. Start with the organizational structure, aiming for structural units where important business concepts are used in a coherent and homogenous way; If in doubt prefer smaller Bounded Contexts; If still in doubt put each API in its own Bounded Context
  7. A Bounded Context Data Model should be published as RAML fragments (RAML types, possibly in a RAML Library) in Anypoint Design Center and Anypoint Exchange, so that it can be easily re-used in all APIs in a Bounded Context
  8. The approach to mapping between Bounded Context Data Models is called anticorruption layer
  9. Both the CloudHub Object Store  as well as an external database keep state outside the nodes (CloudHub workers) to which the API implementation is deployed and therefore do not make the API implementation stateful in the above sense
  10. If, however, state is kept in local memory or on disk on a node, or in-memory replicated amongst nodes (these are also options with Object Stores, where the latter is not available in CloudHub) then the API implementation is stateful 
  11. Safe HTTP methods are ones that do not alter the state of the underlying resource. 
    1. GET
    2. HEAD
    3. OPTIONS
  12. REST API headers used for caching: (Cacheable trait in RAML fragment)
    1. Cache-Control
    2. Last-Modified
    3. ETag 
    4. If-Match, If-None-Match, If-Modified-Since 
    5. Age
  13. HTTP-based optimistic concurrency control
    1. ETag HTTP response header to send a resource version ID in the HTTP response from the API implementation to the API client
    2. If-Match HTTP request header to send the resource version ID on which an update is based in an HTTP PUT/POST/PATCH request from API client to API implementation 
    3. HTTP 412 Precondition Failed client error response code to inform the API client that the resource version ID it sent was stale and hence the requested change not performed 
    4. HTTP 428 Precondition Required client error response code to inform the API client that the resource in question is  protected against concurrent modification and hence requires If-Match HTTP request headers, which were however missing from the HTTP request

API Implementation

  1. API implementations developed as Mule applications and deployed to a Mule runtime (this includes API proxies) should always be configured to participate in auto-discovery
  2. The API implementation by default refuses API invocations until all API policies have been applied. This is called the gatekeeper feature of the Mule runtime
  3. Default configuration - Shared Cloudhub VPC
  4. Firewall rules (AWS Security Groups, a form of stateful firewall) of the CloudHub Shared Worker Cloud are fixed:
    1. TCP/IP traffic from anywhere to port 8081 (HTTP) and 8082 (HTTPS) on each CloudHub worker 
    2. TCP/IP traffic from within the AWS VPC to port 8091 (HTTP) and 8092 (HTTPS) on each CloudHub worker
  5. Custom Anypoint VPC
  6. Every CloudHub worker receives a private IP address from the address range of its VPC - be it an Anypoint VPC or the CloudHub Shared Worker Cloud. A well-known DNS record resolves to those private IP addresses
  7. Anypoint Platform Architecture Application Networks
  8. Every CloudHub worker also receives a public IP address that is NOT under the control of the Anypoint VPC admin. Again, a well-known DNS record resolves to these public IP addresses
  9. Anypoint VPCs and Anypoint Platform environments can be in a many-to-many relationship
  10. Anypoint VPCs cannot be peered with each other
  11. The CloudHub Shared Load Balancer terminates TLS connections and uses its own server-side certificate
  12. The upstream protocol for the VPC-internal communication between CloudHub Dedicated Load Balancer and CloudHub workers can be configured to be HTTPS or HTTP, i.e., can be different from the protocol used by the API client (unlike with the CloudHub Shared Load Balancer)
  13. The state of an Object Store is (only) available to all workers of its owning Mule application Can be circumvented using the Object Store REST API 
    1. Max TTL of 30 days
  14. CQRS - Commands are formulated in the domain language of the Bounded Context and trigger writes 
    1. Commands are typically queued and executed asynchronously
  15. Queries are optimized for the API clients' exact needs (joining and aggregating data as needed) and execute synchronously

Production Deployment

  1. Anypoint Platform has no direct support for "canary deployments", i.e. the practice of initially only directing a small portion of production traffic to the newly deployed API implementation
  2. Promoting all supported parts of the Anypoint API Manager entry for an API instance does not copy API clients
  3. The client ID/secret a given API client uses for accessing an API is, however, independent of the environment. That is, if the same API client has been granted access to an API instance in the Sandbox and Production environments, it must make API invocations to both API endpoints with the same client ID/secret
  4. So, create a new API client and request access so it gets a new set of clientId/secrets for production
Testing
  1. Integration test - There should not be a single interaction in the API Notebook of an API that is not covered by a test scenario
  2. A safe sub-set of integration tests can also be run in production as a "deployment verification test suite"
  3. With MUnit Anypoint Platform provides a dedicated unit testing tool that
    1. is specifically designed for unit testing Mule application
    2. can stub-out external dependencies of a Mule application
    3. has dedicated IDE-support in Anypoint Studio
    4. can be invoked from Maven builds using the MUnit Maven plugin
  4. Resilience testing is the practice of disrupting the web of application network and asserting that the resulting inevitable degradation of the quality of all relevant services offered by/on the application network is within acceptable limits
Scaling Application Network

Vertical scaling - Different sizes of cloudhub workers
Horizontal scaling - Multiple cloudhub workers (max 8) - can be configured as autoscaling

Sunday, September 27, 2020

Install Wordpress on Ubuntu/DigitalOcean

 mysql -u root -p (enter YourPassword)

CREATE DATABASE wordpress DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;

GRANT ALL ON wordpress.* TO 'wordpressuser'@'localhost' IDENTIFIED BY 'YourPassword';

sudo apt install php-curl php-gd php-mbstring php-xml php-xmlrpc php-soap php-intl php-zip

systemctl restart apache2

/etc/apache2/sites-available/wordpress.conf

a2enmod rewrite

apache2ctl configtest

cd /tmp

curl -O https://wordpress.org/latest.tar.gz

tar xzvf latest.tar.gz

touch /tmp/wordpress/.htaccess

cp /tmp/wordpress/wp-config-sample.php /tmp/wordpress/wp-config.php


mkdir /tmp/wordpress/wp-content/upgrade

cp -a /tmp/wordpress/. /var/www/wordpress

chown -R www-data:www-data /var/www/wordpress

find /var/www/wordpress/ -type d -exec chmod 750 {} \;

find /var/www/wordpress/ -type f -exec chmod 640 {} \;

curl -s https://api.wordpress.org/secret-key/1.1/salt/


Replace following lines in /var/www/wordpress/wp-config.php

define('AUTH_KEY',         'xxxx');

define('SECURE_AUTH_KEY',  'xxx');

define('LOGGED_IN_KEY',    'xxx');

define('NONCE_KEY',        'xxx');

define('AUTH_SALT',        'xxx');

define('SECURE_AUTH_SALT', 'xxx');

define('LOGGED_IN_SALT',   'xxx');

define('NONCE_SALT',       'xxx');


Change DB settings in /var/www/wordpress/wp-config.php



Thursday, December 26, 2019

AWS CSA Associate Exam Notes

Architecture pillars
  1. Operational Excellence - Prepare, Operate, Evolve
  2. Security -
  3. Reliability - ability to recover
  4. Performance efficiency - ElasticCahce (memd, redis), Kinesis Streams
  5. Cost Optimization - AWS trusted advisor
S3
bucket name - 3-63 chars
Max Number of buckets 100
Files can be up to 5TB
S3 is universal namespace

S3 Standard --> S3 IA --> S3 Intelligent Tiering --> S3 One Zone IA --> S3 Glacier --> S3 Glacier Deep Archive

S3 encryption in transit - https
S3 encryption at rest server side
  1. S3 Managed Keys SSE -S3
  2. SSE-KMS - AWS KMS
  3. SSE with Customer provided keys (SSE-C)
Client side encryption - Client encrypts before storing in S3

S3 Object Lock  to store objects using Write Once Read Many WORM model
Governance Mode - need special permissions
Compliance Mode  - even root can't overwrite

Virtual Host / Path
In a virtual-hosted–style URL, the bucket name is part of the domain name in the URL. 
For example:
https://bucket-name.s3.amazonaws.com
https://bucket-name.s3.Region.amazonaws.com

In a path-style URL, the bucket name is not part of the domain (unless you use a region-specific endpoint). For example:
US East (N. Virginia) region endpoint, http://s3.amazonaws.com/bucket-name
Region-specific endpoint, https://s3.Region.amazonaws.com/bucket-name

Website
The two general forms of an Amazon S3 website endpoint are as follows:
  • s3-website dash (-) Region ‐ https://bucket-name.s3-website-Region.amazonaws.com
  • s3-website dot (.) Region ‐ https://bucket-name.s3-website.Region.amazonaws.com
  1. Amazon S3 buckets in all Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES
  2. Amazon S3 never adds partial objects; if you receive a success response, Amazon S3 added the entire object to the bucket
  3. Amazon S3 provides four different access control mechanisms:
  • AWS Identity and Access Management (IAM) policies (what can this user do)
  • Access Control Lists (ACLs)  - at object level
  • bucket policies (who can access this s3 resource)
  • query string authentication
  1. The Multi-Object Delete operation enables you to delete multiple objects from a bucket using a single HTTP request
  2. Server access logs for Amazon S3 provide you visibility into object-level operations on your data in Amazon S3
  3. If the IAM user assigns a bucket policy to an Amazon S3 bucket and doesn’t specify the root user as a principal, the root user is denied access to that bucket
  4. Bucket policies - work at bucket level, ACL - work at individual object level
  5. Use a bucket policy to specify which VPC endpoints or external IP addresses can access the S3 bucket
  6. S3 buckets can be configured to log all access requests
  7. Once versioning enabled, it cannot be disabled. It can only be suspended
  8. Cross region replication requires versioning enabled on source and destination
    1. with in an account replication
    2. across accounts 
  9. Only new versions get replicated, existing objects won't get replicated, Delete markers won't get replicated
  10. Transfer acceleration - to speed up upload, you don't need to upload to the bucket, you upload to edge location using a distinct URL
  11. S3 multipart upload provides the following advantages:
    1. Improved throughput 
    2. Quick recovery from any network issues
    3. Pause and resume object uploads
    4. Begin an upload before you know the final object size 
  12. To protect from accidental deletion, enable MFA and enable versioning on the bucket 
  13. 3 different ways to share S3 buckets across accounts
    1. Using bucket policies & IAM - Programmatic access only
    2. Using bucket ACLs and IAM - Programmatic access only
    3. Cross account IAM roles - Programmatic and Console access
  14. AWS Datasync - to replicate from on-premise to S3; uses agent to replicate on schedule basis
CloudFront
  1. You restrict access to Amazon S3 content by creating an origin access identity, which is a special CloudFront user. You change Amazon S3 permissions to give the origin access identity permission to access your objects
  2. CloudFront distributions cannot have origins in different AWS regions
  3. CIoudFront is a global service, and metrics are available only when you choose the US East (N. Virginia) region in the AWS console
  4. For websites hosted on Amazon S3 bucket,  if users are still seeing old content, change the TTL value to remove old objects
  5. Cache invalidation - Perform an invalidation on the CloudFront distribution that is serving the content 
  6. By default, each file automatically expires after 24 hours
  7. Origin is source of files that CDN distributes
  8. Edge locations are not just read only
  9. Cached for TTL, invalidation incurs cost
  10. CloudFront signed URL is for protecting one file
  11. CloudFront signed cookie is for protecting multiple files
Glacier
  1. The total volume of data and number of archives you can store are unlimited
  2. The largest archive that can be uploaded in a single upload request is 4 gigabytes
  3. You can download (via SNS) Glacier vault inventory asynchronously after you request it and Amazon prepares it
  4. Amazon Glacier automatically encrypts the data using AES-256, the same as Amazon S3
  5. You can use the BitTorrent protocol but only for objects that are less than 5 GB in size
  6. Snowball (AWS import/export) appliance to transfer in/out of AWS
  7. AWS Snowmobile is a truck
  8. Amazon Glacier, which enables long-term storage of mission-critical data, has added Vault Lock. This new feature allows you to lock your vault with a variety of compliance controls that are designed to support such long-term records retention
Storage Gateway - service that replicates data from data center to AWS  You can think of a file gateway as a file system mount on S3. This is a VM installed locally on-premise backing up data from on-premise to S3
  1. The file gateway enables you to store and retrieve objects in Amazon S3 using file protocols, such as NFS. Objects written through file gateway can be directly accessed in S3
  2. The tape gateway provides your backup application with an iSCSI virtual tape library (VTL) interface, consisting of a virtual media changer, virtual tape drives, and virtual tapes. Virtual tape data is stored in Amazon S3 or can be archived to Amazon S3 Glacier
  3. The volume gateway provides block storage to your applications using the iSCSI protocol. Data on the volumes is stored in Amazon S3. To access your iSCSI volumes in AWS, you can take EBS snapshots which can be used to create EBS volumes
    • In the cached mode - Entire dataset is stored on S3 and most frequently accessed data is cached on-premise
    • In the stored mode - Entire dataset is stored on-premise and is asynchronously replicated to S3
Athena - interactive query service which enables you to analyze and query data located in S3 using standard SQL (ex: query log files, business reports, queries on click-stream data)

Macie - Security service that uses ML and NLP to discover, classify and protective sensitive data stored in S3, helps identify PII. Useful for preventing ID theft

Cognito 
is an identity broker which handles interaction between applications and webID provider (Facebook, Google)

User pools - username and password are stored in Cognito, JWT is generated on successful authentication
Identity pools - Takes JWT from above and uses it for authorization to AWS services

AssumeRoleWithWebIdentity
Returns a set of temporary security credentials for users who have been authenticated in a mobile or web application with a web identity provider, ex: Facebook

AssumeRoleWithSAML - This operation provides a mechanism for tying an enterprise identity store or directory to role-based AWS access without user-specific credentials or configuration

EBS
  1. Amazon EBS provides six volume types: 
    1. Provisioned IOPS SSD (io2 and io1), 
    2. General Purpose SSD (gp3 and gp2), 
    3. Throughput Optimized HDD (st1)
    4. Cold HDD (sc1)
  2. To create a snapshot for Amazon EBS volumes that serve as root devices, you should stop the instance before taking the snapshot
  3. EBS root volumes can be encrypted
  4. On instance termination, both ROOT volumes (EBS and instance store) are deleted. But, you can tell AWS to keep it if it is EBS volume
  5. Instance Store volume persists only when reboot. It is deleted if instance is stops or terminated
  6. You can only share unencrypted snapshots publicly
  7. When you share an encrypted snapshot, you must also share the customer managed CMK used to encrypt the snapshot
  8. The public datasets are hosted in two possible formats: Amazon Elastic Block Store (Amazon EBS) snapshots and/or Amazon Simple Storage Service (Amazon S3) buckets
  9. Snapshots with AWS Marketplace product codes can’t be made public
  10. Frequent snapshots provide a higher level of data durability and they will degrade the performance of your application while the snapshot is in progress
  11. When creation of an EBS snapshot is initiated, but not completed, the EBS volume Can be used while the snapshot is in progress and an in-progress snapshot is not affected by ongoing reads and writes to the volume
  12. The data in an instance store persists only during the lifetime of its associated instance. If an instance reboots (intentionally or unintentionally), data in the instance store persists
  13. I/O operations that are larger than 256 KB are counted in 256 KB capacity units. For example, a 1,024 KB I/O operation would count as 4 IOPS
  14. You can create an Amazon EBS snapshot using an Amazon CloudWatch Events rule
  15. You can use Amazon Data Lifecycle Manager to automate the creation, retention, and deletion of EBS snapshots and EBS-backed AMIs
  16. For customers who have architected complex transactional databases using EBS, it is recommended that backups to Amazon S3 be performed through the database management system so that distributed transactions and logs can be check pointed
  17. AWS does not copy launch permissions, user-defined tags, or Amazon S3 bucket permissions from the source AMI to the new AMI
You can have two types of RAID:
    1. RAID 0 – splits ("stripes") data evenly across two or more disks. When I/O performance is more important than fault tolerance; for example, as in a heavily used database (where data replication is already set up separately). You can use RAID 0 configurations in scenarios where you are using heavy databases with perhaps mirroring and replication.
    2. RAID 1 – consists of an exact copy (or mirror) of a set of data on two or more disks. When fault tolerance is more important than I/O performance; for example, as in a critical application. With RAID 1 you get more data durability in addition to the replication features of the AWS cloud.
    3. RAID 5 and RAID 6 are not recommended for Amazon EBS because the parity write operations of these RAID modes consume some of the IOPS available to your volumes
    AWS CloudTrail - who did what, api logs
    1. To get a history of all EC2 API calls (including VPC and EBS) made on your account, you simply turn on Cloud Trail in the AWS Management Console
    2. CloudTrail logs provide you with detailed API tracking for Amazon S3 bucket-level and object-level operations. 
    AWS CloudWatch - what is happening on AWS
    1. Amazon CloudWatch Alarms have three possible states:
      1. OK: The metric is within the defined threshold
      2. ALARM: The metric is outside of the defined threshold
      3. INSUFFICIENT_DATA: The alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state 
    2. Amazon CloudWatch stores metrics for terminated Amazon EC2 instances or deleted Elastic Load Balancers for 15 months
    3. Default monitoring - 5 mins
    4. Detailed monitoring - 1 mins
    5. With Amazon CloudWatch, each metric data point must be marked with a time stamp. The time stamp can be up to two weeks in the past and up to two hours into the future
    AWS config - keeps track of all changes to your resources by invoking the Describe or the List API call for each resource in your account. The service uses those same API calls to capture configuration details for all related resources.

    Security Groups
    1. SGs are stateful - if inbound is allowed, outbound is automatically allowed
    2. SGs operate at instance level, are also called firewalls
    3. fine grained
    4. only allow rules
    When you associate multiple security groups with an instance, the rules from each security group are effectively aggregated to create one set of rules. We use this set of rules to determine whether to allow access

    ACLs
    1. firewalls that control traffic in and out of subnet
    2. stateless
    3. enforced at subnet level not at instance level
    4. coarse grained
    Route table
    1. destination is where the traffic should be destined and target is how it should be routed
    2. destination is subnet or an instance, target is usually local or IGW
    VPC 
    1. VPC Wizard - Instances that you launch into a default subnet receive both a public IP address and a private IP address.
    2. The purpose of an "Egress-Only Internet Gateway" is to allow IPv6 traffic within a VPC to access the internet and deny any inbound traffic from internet into VPC
    3. When you create a custom VPC, security groups, route table, ACL are created automatically
    To allow traffic into VPC
    • attach IGW to VPC
    • subnet's route table points to IGW
    • instances have public or elastic IP
    • NACLs and SGs allow traffic
    Outbound traffic from Private instances
    • NAT gateway
    • EC2 instance setup as NAT in public subnet
    • Disable Source/Destination check on NAT instance
    First 3 and last two IPs in each subnet are reserved by AWS

    My instance connection timing out - Check routes 0.0.0.0/0 to target IGW
    • check your security group rules
    • check network ACLs
    • make sure your instance has a public IP, if not attach an elastic IP
    Elastic IP address is billed per hour when not associate with an instance

    NAT instance (slow) vs NAT Gateways (managed, fast)

    VPC flow logs - capture IP traffic going in/out of all n/w interfaces in a selected resource
    • can create flow logs for a VPC, subnet or n/w interfaces
    • logs published to a group in CloudWatch
    • VPC Flow logs -  logging of all network access attempts to Amazon EC2 instances in their production VPC on AWS
    VPC peering
    • no transitivity
    • customer gateway
    • hub and spoke
    VPC peering routes traffic between source and destination VPCs only, it does not support edge to edge routing

    Placement Groups
    1. A placement group is a logical grouping of EC2 instances with in a single Availability Zone.
    2. Using placement groups enables applications to participate in a low-latency, 10 GBPS network
    3. The name you specify for a placement group must be unique within your AWS account for the Region
    4. cluster placement group can't span multiple Availability Zones
    5. spread placement group can span multiple Availability Zones in the same Region. You can have a maximum of 7 running instances per Availability Zone per group
    6. partition placement group supports a maximum of 7 partitions per Availability Zone. The number of instances that you can launch in a partition placement group is limited only by your account limits
    By default, all accounts are limited to 5 Elastic IP addresses per region.

    Connecting on-premise to AWS
    • AWS hardware VPN
    • one of the EC2 instances used as s/w VPN
    EC2
    1. For all new AWS accounts, there is a soft limit of 20 EC2 instances per region
    2. If you connect to your instance using SSH and get any of the following errors, Host key not found in [directory], Permission denied (public key), or Authentication failed, permission denied, verify that you are connecting with the appropriate user name for your AMI and that you have specified the proper private key (.pem) file for your instance
    3. If you’re using EC2-Classic, you must use security groups created specifically for EC2-Classic
    4. After you launch an instance in EC2-Classic, you can’t change its security groups. However, you can add rules to or remove rules from a security group
    5. If the elastic IP is a part of EC2 Classic, it cannot be assigned to a VPC instance
    6. For instances launched in EC2-Classic, we release the private IPv4 address when the instance is stopped or terminated. If you restart your stopped instance, it receives a new private IPv4 address
    7. To launch an EC2 instance it is required to have an AMI in that region. If the AMI is not available in that region, then create a new AMI or use the copy command to copy the AMI from one region to the other region
    8. When you want to copy an AMI to a different region, AWS does not automatically copy launch permissions, user-defined tags or S3 bucket permissions from source AMI to the new AMI
    9. For instances launched in a VPC, a private IPv4 address remains associated with the network interface when the instance is stopped and restarted, and is released when the instance is terminated
    10. In Amazon EC2 Container Service, Docker is the only container platform supported by EC2 Container Service presently
    11. Docker - runs on Fargate, ECS, BeansTalk
    12. Different instances running on the same physical machine are isolated from each other via the Xen hypervisor
    13. You must terminate all running instances in the subnet before you can delete the subnet
    14. The application is requited to run Monday. Wednesday, and Friday from 5 AM to 11 AM. On-Demand is the MOST cost-effective Amazon EC2 pricing model
    15. You can use private IPv4 addresses for communication between instances in the same network (EC2-Classic or a VPC)
    16. Spot instance is useful when the user wants to run a process temporarily
    17. The DisableApiTermination attribute does not prevent you from terminating an instance by initiating shutdown from the instance (using an operating system command for system shutdown) when the InstanceInitiatedShutdownBehavior attribute is set
    18. Underlying hypervisor for EC2 are; Xen and Nitro
    19. The number of ENIs you can attach varies by instance type
    20. Which services allow the customer to retain full administrative privileges of the underlying EC2 instances? AWS EC2, OpsWork, Elastic Beanstalk, EMR
    21. For the best performance, we recommend that you use current generation instance types and HVM AMIs when you launch your instances
    22. A t2.medium EC2 instance type must be launched with what type of Amazon Machine Image (AMI)? An Amazon EBS-backed Hardware Virtual Machine AMI
    23. Amazon EC2 resource which cannot be tagged - Elastic IP and Key Pair
    24. What is the default maximum number of Access Keys per user? 2
    25. Prefer ondemand instances behind spot instances for savings for stateless web servers

    AWS Direct Connect
    1. Provides private n/w connection between AWS and on-premise
    2. AWS --> Direct connect location --> On-Premise network
    3. AWS---> Virtual Private Gateway --> Customer Gateway --> On-Premise network
    4. Each AWS Direct Connect location enables connectivity to all Availability Zones within the geographically nearest AWS region
    5. AWS connection redundancy with AWS Hardware VPNs
    6. Amazon direct connect redundancy based on BGP policies

    RDS
    1. When primary DB in RDS in multi-AZ deployment fails:​
      • original primary instance is terminated and a new standby is created
      • DNS record is switched to new primary (AWS RDS uses DNS, no IPs)
      • standby in another AZ is promoted to become new primary
    2. RDS uses DB security groups, VPC security groups, and EC2 security groups
    3. Amazon RDS automatically provisions and maintains a synchronous “standby” replica in a different Availability Zone
    4. You cannot reduce storage size of a RDS DB Instance once it has been allocated
    5. In most cases, RDS scaling storage doesn't require any outage and doesn't degrade performance of the server 
    6. You can use Oracle SQL Developer to import a simple, 20 MB database; you want to use Oracle Data Pump to import complex databases or databases that are several hundred megabytes or several terabytes in size
    7. Max backup retention period is 35 days
    8. In RDS instance, you must increase storage size in increments of at least 10%
    9. FreeStorageSpace:The amount of available storage space.

    Load balancers
    if you need the IPv4 address of your end user, look for X-Forwarded-For header
    1. application level 
    2. network level
    3. classic (both, for ec-2 classic)
    ELB
    1. communication between the load balancer and its back-end instances uses only IPv4
    2. A user can suspend scaling processing temporarily and reenable it
    3. If you have an Auto Scaling group with running instances and you choose to delete the Auto Scaling group, the instances will be terminated and the Auto Scaling group will be deleted
    4. Application load balancers support dynamic host port mapping
    5. ALB provides native support for WebSocket via the ws:// and wss:// protocols
    6. Load balancing algorithms
      1. Application Load Balancers - round robin
      2. Classic Load Balancers 
        1. least outstanding requests routing algorithm for http(s)
        2. round robin for TCP
      3. Network Load Balancers - flow hash algorithm
    7. Server Name Indication - With SNI support you can associate multiple certificates with a listener and each secure application behind a load balancer can use its own certificate
    Auto Scaling 
    1. Automatically creates a launch configuration directly from an EC2 instance
    2. Maximum number of Auto Scaling groups that AWS will allow you to create - 20
    3. To create Auto scaling launch configuration, min required are: config name, AMI, Instance Type
    4. An Auto-Scaling group spans 3 AZs and currently has 4 running EC2 instances. When Auto Scaling needs to terminate an EC2 instance by default, Auto Scaling will: Send an SNS notification, if configured to do so, Terminate an instance in the AZ which currently has 2 running EC2 instances
    5. Default termination logic - select AZ, select oldest launch configuration, select instance closest to next billing hour
    6. Auto Scaling determines whether there are instances in multiple Availability Zones. If so, it selects the Availability Zone with the most instances and at least one instance that is not protected from scale in. If there is more than one Availability Zone with this number of instances, Auto Scaling selects the Availability Zone with the instances that use the oldest launch configuration
    7. Proactive cyclic scaling == scheduled scaling
    8. Currently, HTTP on port 80 is the default health check
    9. Scale ahead of expected increases in load - Scheduled scaling and Metric-based scaling
    10. ElastiCache offerings for In-Memory key/value stores include ElastiCache for Redis, which can support replication, and ElastiCache for Memcached which does not support replication
    11. Redis - sorted sets, pub/sub, in-memory store
    Connection draining - finish in-flight transactions
    HA across regions - AWS Route 53

    Route 53 routing
    1. A hosted zone represents a collection of resource record sets that are managed together under a single domain name
    2. The resource record sets contained in a hosted zone must share the same suffix.
      1. simple (one record with two or more IPs, one IP is selected randomly)
      2. weighted round robin (split traffic to multiple backend instances, like 20% to one, 80% to other)
      3. latency based (route to lowest latency recordset)
      4. health check and DNS failover 
      5. Geolocation (based on national borders i.e countries)
      6. Geo Proximity (based on distance)
      7. multivalue
    3. Route 53 natively supports ELB with an internal health check. Turn "Evaluate target health" on and "Associate with Health Check" off and R53 will use the ELB's internal health check
    4. AWS DNS does not respond to requests from outside the VPC

    SQS - max message size is 256KB
    1. Visibility timeout - during this message won't be available for other listeners
    2. Each queue starts with a default setting of 30 seconds for the visibility timeout, maximum is 12 hours
    3. one million free messages each month
    4. long polling - retrieves all messages, cost efficient
    5. short polling - retrieves subset of messages
    6. shared queues - can be shared across accounts, owner pays
    7. SQS has unlimited queues, unlimited messages
    8. SQS messages are deleted after 4 days (default retention period). Max retention period is 14 days
    9. DelaySeconds - time duration for which a message is hidden when it is first added to queue, whereas visibility timeouts is the duration a message is hidden only after it is consumed from the queue
    10. To scale up SQS processing, use horizontal scaling: increase number of producers and consumers
    Two types
    1. Standard queue (at least one, no guarantee of order)
    2. FIFO (only once, no duplicates, 300 per second)
      SNS - topic based publish/subscribe model
      1. Order not guaranteed
      2. 256KB max (64kb each billed 4 times)
      Amazon MQ
      1. ActiveMQ
      2. 32MB payload
      3. 7 9s
      4. supports XA
      Amazon SWF
      1. By implementing workers and deciders, you focus on your differentiated application logic as it pertains to performing the actual processing steps and coordinating them. Amazon SWF handles the underlying details such as storing tasks until they can be assigned, monitoring assigned tasks, and providing consistent information on their completion
      2. Amazon SWF stores tasks, assigns them to workers when they are ready, and monitors their progress. 
      3. It ensures that a task is assigned only once and is never duplicated
      4. Maximum workflow execution time – 1 year
      Dynamo DB
      1. Combined Value and Name size must not exceed 400KB
      2. ProvisionedThroughputExceededException - One partition is subjected to a disproportionate amount of traffic
      3. DynamoDB supports two types of secondary indexes: Local secondary index, Global secondary index
      4. Amazon DynamoDB supports fast in-place updates
      5. You grant AWS Lambda permission to access a Dynamo DB Stream using an IAM role known as the “execution role”
      HA patterns
      1. Multi-AZ
      2. Database standby, read replicas in each zone
      3. Use elastic IPs
      4. Floating network interface
      Billing
      1. AWS provides an option to have programmatic access to billing. Programmatic Billing Access leverages the existing Amazon Simple Storage Service
      2. Billing commences when Amazon EC2 initiates the boot sequence of an AM instance. Billing ends when the instance shuts down
      3. You are charged for the stack resources for the time they were operating(even if you deleted the stack right away)
      4. 4 levels of AWS support - Basic, Developer, Business and Enterprise 
      5. How should you allocate the cost of the data transfer over AWS Direct Connect back to each department - Configure virtual interfaces and tag each with the department account number. Use detail usage reports
      6. AWS Cost Explorer - you can view recommendations for cost savings, based on CPU/memory/Disk usage etc
      7. AWS Trusted Advisor provides best practices (or checks) in five categories: cost optimization, security, fault tolerance, performance improvement and service limits
      Elastic Transcoder
      1. Convert media files to different formats
      2. Billing based on time you transcode and resolution
      API Gateway
      1. API gateway has caching capabilities
      2. Low cost and scales automatically
      3. Throttle to prevent DOS attaches
      4. Can log results to CloudWatch
      5. Enable CORS to use multiple domains
      6. CORS is enforced by client
      Kinesis
      1. Platform to send streaming data to and analyze streaming data
      2. Process real time data - Game data, GeoSpatial(uber) Data, iOT data
      3. The votes must be collected into a durable, scalable, and highly available data store for real-time public tabulation - Amazon Kinesis
      3 types of Kinesis
      1. Kinesis Streams - used for low latency data ingestion. Data producers send data (click streams, IOT devices, logs) to Kinesis Streams, EC2 consumers analyze that data, Store resulting data in DynamoDB or S3 or EMR or RedShift. 
        1. Data in streams is stored as shards 
        2. Each record, called a data blob, can be up to 1MB
        3. By default data is retained for 24 hours, max 7 days
      2. Kinesis Analytics - analyzes incoming data real time using SQL both in Streams and Firehose
      3. Kinesis Firehose - Load streams into S3, RedShift, ElasticSearch and Splunk

      Miscellaneous 
      1. AWS uses the techniques detailed in DoD 5220.22-M to destroy data as part of the decommissioning process
      2. Amazon EMR is ideal for processing and transforming unstructured or semi-structured data to bring in to Amazon Redshift
      3. Hadoop is an open source Java software framework
      4. To encrypt all data in an Amazon Redshift cluster, Use the AWS KMS Default Customer master key
      5. Domain Keys Identified Mail (DKIM) is a standard that allows senders to sign their email messages and ISPs, and use those signatures to verify that those messages are legitimate and have not been modified by a third party in transit
      6. Every Amazon SES sender has a unique set of sending limits, which are calculated by Amazon
      7. AWS Certificate Manager is a service that lets you easily provision, manage, and deploy Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates for use with AWS services
      8. Key pairs are used only for Amazon EC2 and Amazon CloudFront
      9. Lambda pricing is based on execution time billed in seconds and the amount of memory assigned
      10. Serverless is not about pricing but about shifting operations responsibilities to AWS and the ability to run apps without thinking about capacity
      11. AWS customers are welcome to carry out security assessments or penetration tests against their AWS infrastructure without prior approval for 8 services, listed in the next section under “Permitted Services.”
      12. The user can add AZs on the fly from the AWS console to the ELB
      13. AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration data management and secrets management

      Monday, September 17, 2018

      Java Performance Monitoring and Tuning

      Scalability

      • how well an app behaves as load increases


      Responsiveness

      • web UI  where we measure how long it takes for one txn
      • high pause times are NOT acceptable


      Throughput

      • batch stuff
      • focus is on how many txns are done in a period than the response time of one txn
      • high pause times are acceptable


      Client apps - start up time is important and should be low
      Server apps - startup time is not important and can be high

      Performance Methodology

      1. monitor
      2. profile
      3. Tune

      Garbage Collection

      1. New objects are created in eden space. when eden space is full, minor gc is triggered. This is a stop the world event
      2. Objects still referenced in eden will move to survivor space S0 and then on to S1
      3. If the object still has references after some time, it is moved to tenured (old gen) space
      4. Minor GC can be run in single or multiple threads 
      5. When Tenured gen space is full, a full gc is run. This can be a single or multiple threaded. This is a mostly concurrent event
      6. Permanent Gen - meta data required by JVM...contains class objects and methods..



      Saturday, December 16, 2017

      Java Key Store commands

      List certificates in Java key store

      C:\Program Files\Java\jdk1.8.0_144\jre\lib\security>keytool -list -keystore cacerts
      keytool -list -v -keystore c:/users/Xyz/selfsignedXyz.jks

      Generate key pair
      keytool -genkey -keyalg RSA -alias XyzTestCertAlias -keystore cacerts -validity 365 -keystore c:/users/Xyz/selfsignedXyz.jks

      extract public key to cert file
      keytool -export -alias XyzTestCertAlias -keystore c:/users/Xyz/selfsignedXyz.jks -rfc -file c:/users/Xyz/XyzTestCert.cert

      extract private key - 2 steps
      keytool -v -importkeystore -srckeystore c:/users/Xyz/selfsignedXyz.jks -srcalias XyzTestCertAlias -destkeystore c:/users/Xyz/myp12file.p12 -deststoretype PKCS12

      openssl pkcs12 -in c:/users/Xyz/myp12file.p12 -out c:/users/Xyz/private.pem

      Thursday, December 14, 2017

      Encryption and Decryption of files in Java using BouncyCastle as provider

      Following example shows how to encrypt and decrypt files using BouncyCastle jars

      Download and add these jars to project classpath
      bcmail-jdk15on-158.jar
      bcpkix-jdk15on-158.jar
      bcprov-ext-jdk15on-158.jar
      bcprov-jdk15on-158.jar




      package abcd;
      
      import org.bouncycastle.cms.*;
      import org.bouncycastle.cms.jcajce.JceCMSContentEncryptorBuilder;
      import org.bouncycastle.cms.jcajce.JceKeyTransEnvelopedRecipient;
      import org.bouncycastle.cms.jcajce.JceKeyTransRecipientInfoGenerator;
      import org.bouncycastle.jcajce.provider.symmetric.DES;
      import org.bouncycastle.jce.provider.BouncyCastleProvider;
      import org.bouncycastle.operator.OutputEncryptor;
      
      import java.io.*;
      import java.nio.file.Files;
      import java.security.*;
      import java.security.cert.CertificateEncodingException;
      import java.security.cert.CertificateException;
      import java.security.cert.CertificateFactory;
      import java.security.cert.X509Certificate;
      import java.util.Collection;
      import java.util.Enumeration;
      import java.util.Iterator;
      
      public class EncryptAndDecrypt {
      
          private static final String WORK_DIR = "/C:/Users/xyz/apps/workspace/ABCD";
      
          private static final File SOURCE_PDF = new File(WORK_DIR, "running-log.txt");
          private static final File DESTINATION_FILE = new File(WORK_DIR, "running-log123.txt");
          private static final File DECRYPTED_FILE = new File(WORK_DIR, "decrypted.txt");
      
          public static void main(final String[] args) throws Exception {
              if (!new File(WORK_DIR).exists()) {
                  throw new RuntimeException("Update WORK_DIR to point to the directory the project is cloned into.");
              }
              Files.deleteIfExists(DESTINATION_FILE.toPath());
              Files.deleteIfExists(DECRYPTED_FILE.toPath());
      
              Security.addProvider(new BouncyCastleProvider());
      
              //X509Certificate certificate = getX509Certificate(new File(WORK_DIR, "myp12file.p12"));
              X509Certificate certificate = getX509Certificate(new File(WORK_DIR, "xyzTestCert.cert"));
              PrivateKey privateKey = getPrivateKey(new File(WORK_DIR, "myp12file.p12"), "changeit");
      
              encrypt(certificate, SOURCE_PDF, DESTINATION_FILE);
              decrypt(privateKey, DESTINATION_FILE, DECRYPTED_FILE);
          }
      
          private static void decrypt(PrivateKey privateKey, File encrypted, File decryptedDestination) throws IOException, CMSException {
              byte[] encryptedData = Files.readAllBytes(encrypted.toPath());
      
              CMSEnvelopedDataParser parser = new CMSEnvelopedDataParser(encryptedData);
      
              RecipientInformation recInfo = getSingleRecipient(parser);
              Recipient recipient = new JceKeyTransEnvelopedRecipient(privateKey);
      
              try (InputStream decryptedStream = recInfo.getContentStream(recipient).getContentStream()) {
                  Files.copy(decryptedStream, decryptedDestination.toPath());
              }
      
              System.out.println(String.format("Decrypted '%s' to '%s'", encrypted.getAbsolutePath(), decryptedDestination.getAbsolutePath()));
          }
      
          private static void encrypt(X509Certificate cert, File source, File destination) throws CertificateEncodingException, CMSException, IOException {
              CMSEnvelopedDataStreamGenerator gen = new CMSEnvelopedDataStreamGenerator();
              gen.addRecipientInfoGenerator(new JceKeyTransRecipientInfoGenerator(cert));
              //OutputEncryptor encryptor = new JceCMSContentEncryptorBuilder(CMSAlgorithm.AES256_CBC).setProvider(BouncyCastleProvider.PROVIDER_NAME).build();
              OutputEncryptor encryptor = new JceCMSContentEncryptorBuilder(CMSAlgorithm.AES128_CBC).setProvider(BouncyCastleProvider.PROVIDER_NAME).build();
              
              try (FileOutputStream fileStream = new FileOutputStream(destination);
                   OutputStream encryptingStream = gen.open(fileStream, encryptor)) {
      
                  byte[] unencryptedContent = Files.readAllBytes(source.toPath());
                  encryptingStream.write(unencryptedContent);
              }
      
              System.out.println(String.format("Encrypted '%s' to '%s'", source.getAbsolutePath(), destination.getAbsolutePath()));
          }
      
          private static X509Certificate getX509Certificate(File certificate) throws IOException, CertificateException {
              try (InputStream inStream = new FileInputStream(certificate)) {
                  CertificateFactory cf = CertificateFactory.getInstance("X.509");
                  return (X509Certificate) cf.generateCertificate(inStream);
              }
          }
      
          private static PrivateKey getPrivateKey(File file, String password) throws Exception {
              KeyStore ks = KeyStore.getInstance("PKCS12");
              try (FileInputStream fis = new FileInputStream(file)) {
                  ks.load(fis, password.toCharArray());
              }
      
              Enumeration<String> aliases = ks.aliases();
              String alias = aliases.nextElement();
              return (PrivateKey) ks.getKey(alias, password.toCharArray());
          }
      
          private static RecipientInformation getSingleRecipient(CMSEnvelopedDataParser parser) {
              Collection recInfos = parser.getRecipientInfos().getRecipients();
              Iterator recipientIterator = recInfos.iterator();
              if (!recipientIterator.hasNext()) {
                  throw new RuntimeException("Could not find recipient");
              }
              return (RecipientInformation) recipientIterator.next();
          }
      }
      

      PKCS7 signing in Java using BouncyCastle as provider

      Following example shows how to sign a string text using BouncyCastle jars

      Download and add these jars to project classpath
      bcmail-jdk15on-158.jar
      bcpkix-jdk15on-158.jar
      bcprov-ext-jdk15on-158.jar
      bcprov-jdk15on-158.jar




      package abcd;
      
      import java.io.FileInputStream;
      import java.io.InputStream;
      import java.security.KeyStore;
      import java.security.PrivateKey;
      import java.security.Security;
      import java.security.cert.Certificate;
      import java.security.cert.X509Certificate;
      import java.util.ArrayList;
      import java.util.List;
      import org.bouncycastle.cert.jcajce.JcaCertStore;
      import org.bouncycastle.cms.CMSProcessableByteArray;
      import org.bouncycastle.cms.CMSSignedData;
      import org.bouncycastle.cms.CMSSignedDataGenerator;
      import org.bouncycastle.cms.CMSTypedData;
      import org.bouncycastle.cms.jcajce.JcaSignerInfoGeneratorBuilder;
      import org.bouncycastle.jce.provider.BouncyCastleProvider;
      import org.bouncycastle.operator.ContentSigner;
      import org.bouncycastle.operator.jcajce.JcaContentSignerBuilder;
      import org.bouncycastle.operator.jcajce.JcaDigestCalculatorProviderBuilder;
      import org.bouncycastle.util.Store;
      import org.bouncycastle.util.encoders.Base64;
      
      public final class PKCS7Signer {
      
          private static final String PATH_TO_KEYSTORE = "c:/users/xyz/selfsignedxyz.jks";
          private static final String KEY_ALIAS_IN_KEYSTORE = "xyzCertAlias";
          private static final String KEYSTORE_PASSWORD = "xyz";
          private static final String SIGNATUREALGO = "SHA1withRSA";
      
          public PKCS7Signer() {
          }
      
          KeyStore loadKeyStore() throws Exception {
      
              KeyStore keystore = KeyStore.getInstance("JKS");
              InputStream is = new FileInputStream(PATH_TO_KEYSTORE);
              keystore.load(is, KEYSTORE_PASSWORD.toCharArray());
              return keystore;
          }
      
          CMSSignedDataGenerator setUpProvider(final KeyStore keystore) throws Exception {
      
              Security.addProvider(new BouncyCastleProvider());
              Certificate[] certchain = (Certificate[]) keystore.getCertificateChain(KEY_ALIAS_IN_KEYSTORE);
              final List<Certificate> certlist = new ArrayList<Certificate>();
              for (int i = 0, length = certchain == null ? 0 : certchain.length; i < length; i++) {
                  certlist.add(certchain[i]);
              }
      
              Store certstore = new JcaCertStore(certlist);
      
              Certificate cert = keystore.getCertificate(KEY_ALIAS_IN_KEYSTORE);
      
              ContentSigner signer = new JcaContentSignerBuilder(SIGNATUREALGO).setProvider("BC").
                      build((PrivateKey) (keystore.getKey(KEY_ALIAS_IN_KEYSTORE, KEYSTORE_PASSWORD.toCharArray())));
      
              CMSSignedDataGenerator generator = new CMSSignedDataGenerator();
      
              generator.addSignerInfoGenerator(new JcaSignerInfoGeneratorBuilder(new JcaDigestCalculatorProviderBuilder().setProvider("BC").
                      build()).build(signer, (X509Certificate) cert));
      
              generator.addCertificates(certstore);
      
              return generator;
          }
      
          byte[] signPkcs7(final byte[] content, final CMSSignedDataGenerator generator) throws Exception {
      
              CMSTypedData cmsdata = new CMSProcessableByteArray(content);
              CMSSignedData signeddata = generator.generate(cmsdata, true);
              return signeddata.getEncoded();
          }
      
          public static void main(String[] args) throws Exception {
      
              PKCS7Signer signer = new PKCS7Signer();
              KeyStore keyStore = signer.loadKeyStore();
              CMSSignedDataGenerator signatureGenerator = signer.setUpProvider(keyStore);
              String content = "some bytes to be signed";
              byte[] signedBytes = signer.signPkcs7(content.getBytes("UTF-8"), signatureGenerator);
              System.out.println("Signed Encoded Bytes: " + new String(Base64.encode(signedBytes)));
          }
      }
      

      AWS Linux port forwarding and IPTABLES commands

      Port forwarding from 80/443 to 3000

      List
      iptables -t nat -nvL

      List
      sudo cat /etc/sysconfig/iptables

      Insert
      sudo iptables -t nat -I PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 3000
      sudo iptables -t nat -I PREROUTING -p tcp --dport 443 -j REDIRECT --to-ports 3000

      Save
      sudo /sbin/service iptables save

      delete

      sudo iptables -t nat -D PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 3000