I was running Eucalyptus 4.0 - the environment is sounds and has been up for a couple of years without issue prior. I went through the shutdown procedure (stop all instances, stop eucalyptus-cloud, stop eucalyptus-cc, stop each node) and shutdown the environment recently for a move.
When I restored the environment all of the services came back online but no instances would start - new, old, etc. I noticed some issues about IP allocation (network has not changed in this process) so I release all of them back to the cloud and then re-allocated them.
I then had came across some online information due to other errors I was observing and ended up modifying two parameters:
euca-modify-property -p cloud.network.global_max_network_tag=2048
euca-modify-property -p cloud.network.global_min_network_tag=1024
Once this was done and I restarted the cloud again I was able to successfully launch new instances. With no long on the existing instances I upgraded --> 4.0.1 --> 4.0.2. Everything appeared upgrade without issue (my console still reports 4.0.0 but euca-version reports eucalyptus 4.0.2 with euca2ools 3.1.1/Omega).
However, I'm about 14 hours into it and I cannot start an old [EBS-backed] instance. It goes from stopped --> pending --> stopping --> stopped in a matter of seconds - and you can only even tell that from the logs. I believe there is some extra data leftover in the "metadata_extant_network" table (maybe something did not shutdown properly?) but I cannot identify what, nor can I remove records manually due to FK constraints, and I don't want to risk corrupting the database. Here are my logs when I attempt to start an instance - there must be a "proper" way to do this ... :
cloud-exhaust.log
Tue Dec 9 10:04:29 2014 WARN [org.jboss.netty.channel.DefaultChannelPipeline:Eucalyptus.eucalyptus:Ephemeral
[bitronix.tm.twopc.Preparer:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] executing transaction with 0 enlisted resource
Tue Dec 9 10:04:30 2014 WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] SQL Error: 0, SQLState: 23503
Tue Dec 9 10:04:30 2014 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] ERROR: update or delete on table "metadata_extant_network" violates foreign key constraint "fk6a62681ed068841d" on table "metadata_network_group"
Detail: Key (id)=(c75a9938419237320141929ac6a02eea) is still referenced from table "metadata_network_group".
postgresql-Tue.log
ERROR: update or delete on table "metadata_extant_network" violates foreign key constraint "fk6a62681ed068841d" on table "metadata_network_group"
DETAIL: Key (id)=(c75a9938419237320141929ac6a02eea) is still referenced from table "metadata_network_group".
STATEMENT: delete from metadata_extant_network where id=$1 and version=$2
ERROR: update or delete on table "metadata_extant_network" violates foreign key constraint "fk6a62681ed068841d" on table "metadata_network_group"
DETAIL: Key (id)=(c75a9938419237320141929ac6a02eea) is still referenced from table "metadata_network_group".
STATEMENT: delete from metadata_extant_network where id=$1 and version=$2
cloud-output.log
2014-12-09 10:04:30 ERROR | org.hibernate.exception.ConstraintViolationException: could not execute statement
2014-12-09 10:04:41 INFO | :1418144681687:Address:ADDRESS_STATE:TOP:Address 192.168.0.216 arn:aws:euare:000000000001:user/nobody available 0.0.0.0 AddressTransition system:unallocated->impending(true)
2014-12-09 10:04:41 ERROR | com.eucalyptus.cloud.util.MetadataException: org.hibernate.LazyInitializationException: could not initialize proxy - no Session
2014-12-09 10:04:41 WARN | Aborting resource token: ResourceToken:i-812D40D4:resources=TypedContext:{com.eucalyptus.util.TypedKey(NetworkResources)=[com.eucalyptus.compute.common.network.PrivateNetworkIndexResource(5), com.eucalyptus.compute.common.network.PublicIPResource()]}
cloud-debug.log
Tue Dec 9 10:04:30 2014 ERROR [NetworkGroups:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] org.hibernate.exception.ConstraintViolationException: could not execute statement
Tue Dec 9 10:04:41 2014 INFO [AdmissionControl:Compute.10] Found authorized clusters: [cc-192.168.0.150]
Tue Dec 9 10:04:41 2014 INFO [AdmissionControl:Compute.10] Availability: cc-192.168.0.150 -> 5
Tue Dec 9 10:04:41 2014 ERROR [ClusterAllocator:Eucalyptus.cluster:ClusterConfiguration:arn:euca:eucalyptus:cluster01:cluster:cc-192.168.0.150/.class java.util.concurrent.ThreadPoolExecutor$Worker#458] com.eucalyptus.cloud.util.MetadataException: org.hibernate.LazyInitializationException: could not initialize proxy - no Session
Tue Dec 9 10:04:41 2014 WARN [Allocations:Eucalyptus.cluster:ClusterConfiguration:arn:euca:eucalyptus:cluster01:cluster:cc-192.168.0.150/.class java.util.concurrent.ThreadPoolExecutor$Worker#458] Aborting resource token: ResourceToken:i-812D40D4:resources=TypedContext:{com.eucalyptus.util.TypedKey(NetworkResources)=[com.eucalyptus.compute.common.network.PrivateNetworkIndexResource(5), com.eucalyptus.compute.common.network.PublicIPResource()]}
cloud-error.log
Tue Dec 9 10:04:30 2014 ERROR [NetworkGroups:Eucalyptus.eucalyptus:EphemeralConfiguration:arn:euca:eucalyptus:::com.eucalyptus.network.DispatchingNetworkingService/.class java.util.concurrent.ThreadPoolExecutor$Worker#346] org.hibernate.exception.ConstraintViolationException: could not execute statement
Tue Dec 9 10:04:41 2014 ERROR [ClusterAllocator:Eucalyptus.cluster:ClusterConfiguration:arn:euca:eucalyptus:cluster01:cluster:cc-192.168.0.150/.class java.util.concurrent.ThreadPoolExecutor$Worker#458] [com.eucalyptus.cloud.run.ClusterAllocator.cleanupOnFailure(ClusterAllocator.java):274] com.eucalyptus.cloud.util.MetadataException: org.hibernate.LazyInitializationException: could not initialize proxy - no Session
So then I logged into the PostgreSQL database directly, removed the FK constraints, and manually removed the rows identified in the logs:
ALTER TABLE metadata_extant_network DROP CONSTRAINT "fk45157a25f1ac537e";
ALTER TABLE metadata_network_group DROP CONSTRAINT "fk6a62681ed068841d";
DELETE FROM metadata_extant_network WHERE id='c75a9938419237320141929ac6a02eea';
The delete was successful put after attempting to restart the instances I receive a new error:
euca-start-instances: error (InternalFailure): Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free.
Tue Dec 9 11:04:23 2014 ERROR [org.mule.exception.DefaultMessagingExceptionStrategy:Compute.15]
********************************************************************************
Message : Component that caused exception is: DefaultJavaComponent{Compute.component}. Message payload is of type: StartInstancesType
Code : MULE_ERROR--2
--------------------------------------------------------------------------------
Exception stack is:
1. Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free. (com.eucalyptus.cloud.util.NotEnoughResourcesException)
com.eucalyptus.network.NetworkGroup:325 (null)
2. Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free. (com.eucalyptus.cloud.util.NotEnoughResourcesException)
com.eucalyptus.cloud.run.AdmissionControl$RunAdmissionControl:148 (null)
3. Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free. (java.lang.RuntimeException)
com.eucalyptus.util.Exceptions:255 (null)
4. Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free. (com.eucalyptus.util.EucalyptusCloudException)
com.eucalyptus.compute.service.ComputeService:69 (null)
5. Component that caused exception is: DefaultJavaComponent{Compute.component}. Message payload is of type: StartInstancesType (org.mule.component.ComponentException)
org.mule.component.DefaultComponentLifecycleAdapter:352 (http://www.mulesoft.org/docs/site/current3/apidocs/org/mule/component/ComponentException.html)
--------------------------------------------------------------------------------
Root Exception stack trace:
com.eucalyptus.cloud.util.NotEnoughResourcesException: Failed to allocate network tag for network: arn:aws:euca:eucalyptus:821881850233:security-group/ownCloud/: no network tags are free.
at com.eucalyptus.network.NetworkGroup.extantNetwork(NetworkGroup.java:325)
at com.eucalyptus.network.GenericNetworkingService$_prepareSecurityGroup_closure3_closure12.doCall(GenericNetworkingService.groovy:198)
at sun.reflect.GeneratedMethodAccessor770.invoke(Unknown Source)
+ 3 more (set debug level logging or '-Dmule.verbose.exceptions=true' for everything)
********************************************************************************
It looks like the you have configured a value for vlan tags that is not compatible with your security group settings. You should not restrict the global range unless you need to reserve vlan tags for some other use.
https://www.eucalyptus.com/docs/eucalyptus/4.0.2/#install-guide/configuring_security_groups.html